Practical rules for Django MiddleWare ordering? - django

The official documentation is a bit messy: 'before' & 'after' are used for ordering MiddleWare in a tuple, but in some places 'before'&'after' refers to request-response phases. Also, 'should be first/last' are mixed and it's not clear which one to use as 'first'.
I do understand the difference.. however it seems to complicated for a newbie in Django.
Can you suggest some correct ordering for builtin MiddleWare classes (assuming we enable all of them) and — most importantly — explain WHY one goes before/after other ones?
here's the list, with the info from docs I managed to find:
UpdateCacheMiddleware
Before those that modify 'Vary:' SessionMiddleware, GZipMiddleware, LocaleMiddleware
GZipMiddleware
Before any MW that may change or use the response body
After UpdateCacheMiddleware: Modifies 'Vary:'
ConditionalGetMiddleware
Before CommonMiddleware: uses its 'Etag:' header when USE_ETAGS=True
SessionMiddleware
After UpdateCacheMiddleware: Modifies 'Vary:'
Before TransactionMiddleware: we don't need transactions here
LocaleMiddleware, One of the topmost, after SessionMiddleware, CacheMiddleware
After UpdateCacheMiddleware: Modifies 'Vary:'
After SessionMiddleware: uses session data
CommonMiddleware
Before any MW that may change the response (it calculates ETags)
After GZipMiddleware so it won't calculate an E-Tag on gzipped contents
Close to the top: it redirects when APPEND_SLASH or PREPEND_WWW
CsrfViewMiddleware
Before any view middleware that assumes that CSRF attacks have been dealt with
AuthenticationMiddleware
After SessionMiddleware: uses session storage
MessageMiddleware
After SessionMiddleware: can use Session-based storage
XViewMiddleware
TransactionMiddleware
After MWs that use DB: SessionMiddleware (configurable to use DB)
All *CacheMiddleWare is not affected (as an exception: uses own DB cursor)
FetchFromCacheMiddleware
After those those that modify 'Vary:' if uses them to pick a value for cache hash-key
After AuthenticationMiddleware so it's possible to use CACHE_MIDDLEWARE_ANONYMOUS_ONLY
FlatpageFallbackMiddleware
Bottom: last resort
Uses DB, however, is not a problem for TransactionMiddleware (yes?)
RedirectFallbackMiddleware
Bottom: last resort
Uses DB, however, is not a problem for TransactionMiddleware (yes?)
(I will add suggestions to this list to collect all of them in one place)

The most difficult part is that you have to consider both directions at the same time when setting the order. I would say that's a flaw in the design and I personally would opt for a separate request and response middleware order (so you wouldn't need hacks like FetchFromCacheMiddleware and UpdateCacheMiddleware).
But... alas, it's this way right now.
Either way, the idea of it all is that your request passes through the list of middlewares in top-down order for process_request and process_view. And it passes your response through process_response and process_exception in reverse order.
With UpdateCacheMiddleware this means that any middleware that changes the Vary headers in the HTTP request should come before it. If you change the order here than it would be possible for some user to get a cached page for some other user.
How can you find out if the Vary header is changed by a middleware? You can either hope that there are docs available, or simply look at the source. It's usually quite obvious :)

One tip that can save your hair is to put TransactionMiddleware in such place on the list, in which it isn't able to rollback changes commited to the database by other middlewares, which changes should be commited no matter if view raised an exception or not.

Related

usage of require_POST in django

I wanna know is there a usage or security tip to use required-post or require-get decorators in django?
from django.views.decorators.http import require_GET, require_POST
#require_GET [Django-doc], and #require_POST [Django-doc] are special cases of #require_http_methods(…) [Django-doc]. These enforce that you can only call a view with certain HTTP methods.
The HTTP protocol specifies that requests that "safe methods", for example a GET request should not have side effects [rfc-editor.org]:
Request methods are considered "safe" if their defined semantics are essentially read-only; i.e., the client does not request, and does not expect, any state change on the origin server as a result of applying a safe method to a target resource.
(…)
Of the request methods defined by this specification, the GET, HEAD, OPTIONS, and TRACE methods are defined to be safe.
This thus means that you should not create, update, remove, etc. records with a GET request for example. A "search spider" for example will normally assume that a GET request is safe. If you have a view that deletes an item through a GET request, then the spider might accidentally remove an entry.
By using such decorators, you thus require the requests to have a certain HTTP method.

Is this Django Middleware Thread-safe?

I am writing forum app on Django using custom session/auth/users/acl system. One of goals is allowing users to browse and use my app even if they have cookies off. Coming from PHP world, best solution for problem is appending sid= to every link on page. Here is how I plan to do it:
Session middleware checks if user has session cookie or remember me cookie. If he does, this most likely means cookies work for him. If he doesnt, we generate new session ID, open new session (make new entry in sessions table in DB), then send cookie and redirect user to where he is, but with SID appended to url. After redirect middleware will see if session id can be obtained from either cookie or GET. If its cookie, we stop adding sid to urls. If its GET, we keep them.
I plan to insert SID= part into url's by decorating django.core.urlresolvers.reverse and reverse_lazy with my own function that appends ?sid= to them. However this raises some problems because both middlewares urlresolvers and are not thread safe. To overcome this I created something like this:
class SessionMiddleware(object):
using_decorator = False
original_reverse = None
def process_request(self, request):
self.using_decorator = True
self.original_reverse = urlresolvers.reverse
urlresolvers.reverse = session_url_decorator(urlresolvers.reverse, 's87add8ash7d6asdgas7dasdfsadas')
def process_response(self, request, response):
# Turn off decorator if we are using it
if self.using_decorator:
urlresolvers.reverse = self.original_reverse
self.using_decorator = False
return response
If SID has to be passed via links, process_request sets using_decorator to true and stores undecorated urlresolvers.revers in separate method. After page is rendered process_response checks using_decorator to see if it has to perform "garbage collection". If it does, it returns reverse function to original undecorated state.
My question is, is this approach thread-safe? Or will increase in traffic on my forum may result in middleware decorating those functions again and again and again, failing to run "garbage collection"? I also tought about using regex to simply skim generated HTML response for links and providing template filters and variables for manually adding SID to places that are omitted by regex.
Which approach is better? Also is current one thread safe?
First of all: Using SIDs in the URL is quite dangerous, eg if you copy&paste a link for a friend he is signed in as you. Since most users don't know what a SID is they will run into this issue. As such you should never ever use SIDs in the url and since Facebook and friends all require cookies you should be fine too...
Considering that, monkeypatching urlresolvers.reverse luckily doesn't work! Might be doable with a custom URLResolvers subclass, but I recommend against it.
And yes, your middleware is not threadsafe. Middlewares are initialized only once and shared between threads, meaning that storing anything on self is not threadsafe.

django cache conditional view processing latest_entry

I have a model called aps
class aps(model.Model):
u=models.ForeignKey(User, related_name='u')
when=models.DateTimeField() #datetime when row was inserted
a=models.ForeignKey(User, related_name='a')
a_read=models.BooleanField()
a_last_read=models.DateTimeField()
The records from aps are retrieved like this:
def displayAps(request, name)
apz=aps.objects.order_by('-when').filter(u=User.objects.get(username=name))
return render_to_response(template.html, {'apz':apz})
And further they are shown in template.html ...
What I am trying to achieve is actually what django docz mention here about conditional view processing
I do something like:
def latest_entry(request, name):
return aps.objects.filter().latest('when').when
#cache_page(60*15)
#last_modified(latest_entry)
def displayAps(request, name)
apz=aps.objects.order_by('-when').filter(u=User.objects.get(username=name))
return render_to_response(template.html, {'apz':apz})
If new rows are added the content is not modified, but retrieved from cache. I need to delete the cache files and 'shift refresh' the browser to see the new rows.
Anyone sees what I am doing wrong here?
If you take out the cache_page decorator, does it work?
The cache_page decorator is the first piece of code that actually gets called in your view function. It is checking the timestamp, and returning the cached data, as it is supposed to. If the cache hasn't expired, the last_modified decorator won't ever be called.
You probably want to be more careful about mixing conditional response processing and static caching, anyway. They accomplish similar things, but using very different mechanisms.
cache_page tells django to only use the view to render an actual response every n seconds. If another request comes in before that, the same rendered content will be returned to the client -- whether it is actually stale or not. This reduces your server load, but doesn't do anything to reduce your bandwidth.
last_modified handles the case where the client says "I have a version of this page that is this old; is it still good?" In that case, your server can check the database, and return a very short "It's still good" response if the database hasn't changed. This cuts down your bandwidth needs significantly for those cases, but you still need to go to the database to determine whether the client's cache is stale or not, so your server load may be almost the same.
Like I mentioned above, you can't just apply cache_page before last_modfied -- if the database has changed, cache_page won't know about it. Worse, if the cache timeout has expired, but the database hasn't changed, then you might end up caching the "304 Not modified" message, and sending that for all subsequent visitors for the next fifteen minutes.
You could apply the decorators in the other order, but you have to make a request from the database for each request, and you could still get into a situation where the database has changed, but the cache hasn't expired -- in that case, the client could still be getting the old version of the page, even though the server has already hit the database to determine that it has been updated.
The filter depends on some logged user or something, you should id this user in cookies and use django's vary_on_cookie.

Django caching bug .. even if caching is disabled

I have a Django site where a strange bug is occurring.
On the site they can add "publications", which is basically the same thing as a blog post under a different name.
Things gets weird when they modify an existing post. They first modify it in the admin and when they go on the site, the change isn't visible. Like if the old version was cached.
In fact, at the beginning I was pretty sure it was a browser caching bug. But after some trials, things got a little weirder.
I found out that clearing browser cache or using a different browser does not solve the problem, but rather interestingly it toggles between the old version and the modified version upon refresh.
So if the body of the post was "Hello World" and I modify it to be "Goodbye cruel world" and then go to the site and refresh the page multiple times, I would see "Hello World", then "Goodbye cruel world", then "Hello World" and so on.. no matter how long I keep doing it.
But it doesn't stop there .. after about 24h everything falls back into place and work normally. No permutation anymore, the site sticks to the new version...
I'm pretty much speechless because I built well over 50 other Django sites using the same server and I never had this problem before.
I'm using the latest django (1.3) with a MySQL DB and caching is not enabled..
Any ideas ?
Edit: A graceful restart of Apache solve the problem .. but restarting apache after each update isn't the greatest thing..
Update: I've just re-setuped my dev environement and I found out the bug is far more acute with the dev server. The modified contend won't show up until I kill/restart the dev server, no matter how often I refresh or clear my cache..
The problem is explicitly addressed in the generic views documentation. The querysets in your extra_context dictionary are evaluated once, when the urlconf is first processed, and each time after that they will continue to use the same values. That's why they only change when you reset Apache or the dev server.
The solution, as described on the linked page, is to use callables which return the querysets, rather than specifying the querysets in the dictionary itself.
I had a similar problem once. It turned out I created the object at the top of the urls.py, and the object was alive as long as the process was alive. You may be using a global variable in one of your views.
There are a few other ways to control cache parameters. For example, HTTP allows applications to do the following:
Define the maximum time a page should be cached.Specify whether a cache should always check for newer versions, only delivering the cached content when there are no changes. (Some caches might deliver cached content even if the server page changed, simply because the cache copy isn't yet expired.**)
In Django, use the cache_control view decorator to specify these cache parameters. In this example, cache_control tells caches to revalidate the cache on every access and to store cached versions for, at most, 3,600 seconds:
from django.views.decorators.cache import cache_control
#cache_control(must_revalidate=True, max_age=3600)
def my_view(request):
# ...
Any valid Cache-Control HTTP directive is valid in cache_control(). Here's a full list:
public=True
private=True
no_cache=True
no_transform=True
must_revalidate=True
proxy_revalidate=True
max_age=num_seconds
s_maxage=num_seconds

Invalidating a path from the Django cache recursively

I am deleting a single path from the Django cache like this:
from models import Graph
from django.http import HttpRequest
from django.utils.cache import get_cache_key
from django.db.models.signals import post_save
from django.core.cache import cache
def expire_page(path):
request = HttpRequest()
request.path = path
key = get_cache_key(request)
if cache.has_key(key):
cache.delete(key)
def invalidate_cache(sender, instance, **kwargs):
expire_page(instance.get_absolute_url())
post_save.connect(invalidate_cache, sender = Graph)
This works - but is there a way to delete recursively? My paths look like this:
/graph/123
/graph/123/2009-08-01/2009-10-21
Whenever the graph with id "123" is saved, the cache for both paths needs to be invalidated. Can this be done?
You might want to consider employing a generational caching strategy, it seems like it would fit what you are trying to accomplish. In the code that you have provided, you would store a "generation" number for each absolute url. So for example you would initialize the "/graph/123" to have a generation of one, then its cache key would become something like "/GENERATION/1/graph/123". When you want to expire the cache for that absolute url you increment its generation value (to two in this case). That way, the next time someone goes to look up "/graph/123" the cache key becomes "/GENERATION/2/graph/123". This also solves the issue of expiring all the sub pages since they should be referring to the same cache key as "/graph/123".
Its a bit tricky to understand at first but it is a really elegant caching strategy which if done correctly means you never have to actually delete anything from cache. For more information here is a presentation on generational caching, its for Rails but the concept is the same, regardless of language.
Another option is to use a cache that supports tagging keys and evicting keys by tag. Django's built-in cache API does not have support for this approach. But at least one cache backend (not part of Django proper) does have support.
DiskCache* is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. To use DiskCache in your project simply install it and configure your CACHES setting.
Installation is easy with pip:
$ pip install diskcache
Then configure your CACHES setting:
CACHES = {
'default': {
'BACKEND': 'diskcache.DjangoCache',
'LOCATION': '/tmp/path/to/directory/',
}
}
The cache set method is extended by an optional tag keyword argument like so:
from django.core.cache import cache
cache.set('/graph/123', value, tag='/graph/123')
cache.set('/graph/123/2009-08-01/2009-10-21', other_value, tag='/graph/123')
diskcache.DjangoCache uses a diskcache.FanoutCache internally. The corresponding FanoutCache is accessible through the _cache attribute and exposes an evict method. To evict all keys tagged with /graph/123 simply:
cache._cache.evict('/graph/123')
Though it may feel awkward to access an underscore-prefixed attribute, the DiskCache project is stable and unlikely to make significant changes to the DjangoCache implementation.
The Django cache benchmarks page has a discussion of alternative cache backends.
Disclaimer: I am the original author of the DiskCache project.
Checkout shutils.rmtree() or os.removedirs(). I think the first is probably what you want.
Update based on several comments: Actually, the Django caching mechanism is more general and finer-grained than just using the path for the key (although you can use it at that level). We have some pages that have 7 or 8 separately cached subcomponents that expire based on a range of criteria. Our component cache names reflect the key objects (or object classes) and are used to identify what needs to be invalidated on certain updates.
All of our pages have an overall cache-key based on member/non-member status, but that is only about 95% of the page. The other 5% can change on a per-member basis and so is not cached at all.
How you iterate through your cache to find invalid items is a function of how it's actually stored. If it's files you can use simply globs and/or recursive directory deletes, if it's some other mechanism then you'll have to use something else.
What my answer, and some of the comments by others, are trying to say is that how you accomplish cache invalidation is intimately tied to how you are using/storing the cache.
Second Update: #andybak: So I guess your comment means that all of my commercial Django sites are going to explode in flames? Thanks for the heads up on that. I notice you did not attempt an answer to the problem.
Knipknap's problem is that he has a group of cache items that appear to be related and in a hierarchy because of their names, but the key-generation logic of the cache mechanism obliterates that name by creating an MD5 hash of the path + vary_on. Since there is no trace of the original path/params you will have to exhaustively guess all possible path/params combinations, hoping you can find the right group. I have other hobbies that are more interesting.
If you wish to be able to find groups of cached items based on some combination of path and/or parameter values you must either use cache keys that can be pattern matched directly or some system that retains this information for use at search time.
Because we had needs not-unrelated to the OP's problem, we took control of template fragment caching -- and specifically key generation -- over 2 years ago. It allows us to use regexps in a number of ways to efficiently invalidate groups of related cached items. We also added a default timeout and vary_on variable names (resolved at run time) configurable in settings.py, changed the ordering of name & timeout because it made no sense to always have to override the default timeout in order to name the fragment, made the fragment_name resolvable (ie. it can be a variable) to work better with a multi-level template inheritance scheme, and a few other things.
The only reason for my initial answer, which was indeed wrong for current Django, was because I have been using saner cache keys for so long I literally forgot the simple mechanism we walked away from.