django cache conditional view processing latest_entry - django

I have a model called aps
class aps(model.Model):
u=models.ForeignKey(User, related_name='u')
when=models.DateTimeField() #datetime when row was inserted
a=models.ForeignKey(User, related_name='a')
a_read=models.BooleanField()
a_last_read=models.DateTimeField()
The records from aps are retrieved like this:
def displayAps(request, name)
apz=aps.objects.order_by('-when').filter(u=User.objects.get(username=name))
return render_to_response(template.html, {'apz':apz})
And further they are shown in template.html ...
What I am trying to achieve is actually what django docz mention here about conditional view processing
I do something like:
def latest_entry(request, name):
return aps.objects.filter().latest('when').when
#cache_page(60*15)
#last_modified(latest_entry)
def displayAps(request, name)
apz=aps.objects.order_by('-when').filter(u=User.objects.get(username=name))
return render_to_response(template.html, {'apz':apz})
If new rows are added the content is not modified, but retrieved from cache. I need to delete the cache files and 'shift refresh' the browser to see the new rows.
Anyone sees what I am doing wrong here?

If you take out the cache_page decorator, does it work?
The cache_page decorator is the first piece of code that actually gets called in your view function. It is checking the timestamp, and returning the cached data, as it is supposed to. If the cache hasn't expired, the last_modified decorator won't ever be called.
You probably want to be more careful about mixing conditional response processing and static caching, anyway. They accomplish similar things, but using very different mechanisms.
cache_page tells django to only use the view to render an actual response every n seconds. If another request comes in before that, the same rendered content will be returned to the client -- whether it is actually stale or not. This reduces your server load, but doesn't do anything to reduce your bandwidth.
last_modified handles the case where the client says "I have a version of this page that is this old; is it still good?" In that case, your server can check the database, and return a very short "It's still good" response if the database hasn't changed. This cuts down your bandwidth needs significantly for those cases, but you still need to go to the database to determine whether the client's cache is stale or not, so your server load may be almost the same.
Like I mentioned above, you can't just apply cache_page before last_modfied -- if the database has changed, cache_page won't know about it. Worse, if the cache timeout has expired, but the database hasn't changed, then you might end up caching the "304 Not modified" message, and sending that for all subsequent visitors for the next fifteen minutes.
You could apply the decorators in the other order, but you have to make a request from the database for each request, and you could still get into a situation where the database has changed, but the cache hasn't expired -- in that case, the client could still be getting the old version of the page, even though the server has already hit the database to determine that it has been updated.

The filter depends on some logged user or something, you should id this user in cookies and use django's vary_on_cookie.

Related

Finding when a django cache was set

I'm trying to implement intelligent cache invalidation in Django for my app with an algorithm of the sort:
page = get_cache_for_item(item_pk + some_key)
cache_set_at = page.SOMETHING_HERE
modified = models.Object.filter(pk=item_pk,modified__gt=cache_set_at).exists() #Cheap call
if modified:
page = get_from_database_and_build_slowly(item_pk)
set_cache_for_item(item_pk + some_key)
return page
The intent, is I want to do a quick call to the database to get the modified time, and if and only if the page was modified since the cache was set, build the page using the resource and database intensive page.
Unfortunately, I can't figure out how to get the time a cache was set at at the step SOMETHING_HERE.
Is this possible?
Django does not seem to store that information. The information is (if stored) in the cache implementation of your choice.
This is for example, the way Django stores a key in memcached.
def set(self, key, value, timeout=DEFAULT_TIMEOUT, version=None):
key = self.make_key(key, version=version)
if not self._cache.set(key, value, self.get_backend_timeout(timeout)):
# make sure the key doesn't keep its old value in case of failure to set (memcached's 1MB limit)
self._cache.delete(key)
Django does not store the creation time and lets the cache handle the timeout. So if any, you should look into the cache of your choice. I know that Redis, for example, does not store that value either, so you will not be able to make it work at all with redis, even if u bypass Django's cache and look into Redis.
I think your best choice is to store the key yourself somehow. You can maybe override the #cache_page or simply create an improved #smart_cache_page and store the timestamp of creation there.
EDIT:
There might be other easier ways to achieve that. You could use post_save signals. Something like this: Expire a view-cache in Django?
Read carefully through it since the implementation depends on your Django version.

How to prevent Django from writing to django_session table for certain URLs

Apologies if my question is very similar to this one and my approach to trying to solve the issue is 100% based on the answers to that question but I think this is slightly more involved and may target a part of Django that I do not fully understand.
I have a CMS system written in Django 1.5 with a few APIs accessible by two desktop applications which cannot make use of cookies as a browser does.
I noticed that every time an API call is made by one of the applications (once every 3 seconds), a new entry is added to django_session table. Looking closely at this table and the code, I can see that all entries to a specific URL are given the same session_data value but a different session_key. This is probably because Django determines that when one of these calls is made from a cookie-less application, the request.session._session_key is None.
The result of this is that thousands of entries are created every day in django_session table and simply running ./manage clearsessions using a daily cron will not remove them from this table, making whole database quite large for no obvious benefit. Note that I even tried set_expiry(1) for these requests, but ./manage clearsessions still doesn't get rid of them.
To overcome this problem through Django, I've had to override 3 Django middlewares as I'm using SessionMiddleware, AuthenticationMiddleware and MessageMiddleware:
from django.contrib.sessions.middleware import SessionMiddleware
from django.contrib.auth.middleware import AuthenticationMiddleware
from django.contrib.messages.middleware import MessageMiddleware
class MySessionMiddleware(SessionMiddleware):
def process_request(self, request):
if ignore_these_requests(request):
return
super(MySessionMiddleware, self).process_request(request)
def process_response(self, request, response):
if ignore_these_requests(request):
return response
return super(MySessionMiddleware, self).process_response(request, response)
class MyAuthenticationMiddleware(AuthenticationMiddleware):
def process_request(self, request):
if ignore_these_requests(request):
return
super(MyAuthenticationMiddleware, self).process_request(request)
class MyMessageMiddleware(MessageMiddleware):
def process_request(self, request):
if ignore_these_requests(request):
return
super(MyMessageMiddleware, self).process_request(request)
def ignore_these_requests(request):
if request.POST and request.path.startswith('/api/url1/'):
return True
elif request.path.startswith('/api/url2/'):
return True
return False
Although the above works, I can't stop thinking that I may have made this more complex that it really is and that this is not the most efficient approach as 4 extra checks are made for every single request.
Are there any better ways to do the above in Django? Any suggestions would be greatly appreciated.
Dirty hack: removing session object conditionally.
One approach would be including a single middleware discarding the session object conditional to the request. It's a bit of a dirty hack for two reasons:
The Session object is created at first and removed later. (inefficient)
You're relying on the fact that the Session object isn't written to the database yet at that point. This may change in future Django versions (though not very likely).
Create a custom middleware:
class DiscardSessionForAPIMiddleware(object):
def process_request(self, request):
if request.path.startswith("/api/"): # Or any other condition
del request.session
Make sure you install this after the django.contrib.sessions.middleware.SessionMiddleware in the MIDDLEWARE_CLASSES tuple in your settings.py.
Also check that settings.SESSION_SAVE_EVERY_REQUEST is set to False (the default). This makes it delay the write to the database until the data is modified.
Alternatives (untested)
Use process_view instead of process_request in the custom middleware so you can check for the view instead of the request path. Advantage: condition check is better. Disadvantage: other middleware might already have done something with the session object and then this approach fails.
Create a custom decorator (or a shared base class) for your API views deleting the session object in there. Advantage: responsibility for doing this will be with the views, the place where you probably like it best (view providing the API). Disadvantage: same as above, but deleting the session object in an even later stage.
Make sure your settings.SESSION_SAVE_EVERY_REQUEST is set to False. That will go a long way in ensuring sessions aren't saved every time.
Also, if you have any ajax requests going to your server, ensure that the request includes the cookie information so that the server doesn't think each request belongs to a different person.

Is this Django Middleware Thread-safe?

I am writing forum app on Django using custom session/auth/users/acl system. One of goals is allowing users to browse and use my app even if they have cookies off. Coming from PHP world, best solution for problem is appending sid= to every link on page. Here is how I plan to do it:
Session middleware checks if user has session cookie or remember me cookie. If he does, this most likely means cookies work for him. If he doesnt, we generate new session ID, open new session (make new entry in sessions table in DB), then send cookie and redirect user to where he is, but with SID appended to url. After redirect middleware will see if session id can be obtained from either cookie or GET. If its cookie, we stop adding sid to urls. If its GET, we keep them.
I plan to insert SID= part into url's by decorating django.core.urlresolvers.reverse and reverse_lazy with my own function that appends ?sid= to them. However this raises some problems because both middlewares urlresolvers and are not thread safe. To overcome this I created something like this:
class SessionMiddleware(object):
using_decorator = False
original_reverse = None
def process_request(self, request):
self.using_decorator = True
self.original_reverse = urlresolvers.reverse
urlresolvers.reverse = session_url_decorator(urlresolvers.reverse, 's87add8ash7d6asdgas7dasdfsadas')
def process_response(self, request, response):
# Turn off decorator if we are using it
if self.using_decorator:
urlresolvers.reverse = self.original_reverse
self.using_decorator = False
return response
If SID has to be passed via links, process_request sets using_decorator to true and stores undecorated urlresolvers.revers in separate method. After page is rendered process_response checks using_decorator to see if it has to perform "garbage collection". If it does, it returns reverse function to original undecorated state.
My question is, is this approach thread-safe? Or will increase in traffic on my forum may result in middleware decorating those functions again and again and again, failing to run "garbage collection"? I also tought about using regex to simply skim generated HTML response for links and providing template filters and variables for manually adding SID to places that are omitted by regex.
Which approach is better? Also is current one thread safe?
First of all: Using SIDs in the URL is quite dangerous, eg if you copy&paste a link for a friend he is signed in as you. Since most users don't know what a SID is they will run into this issue. As such you should never ever use SIDs in the url and since Facebook and friends all require cookies you should be fine too...
Considering that, monkeypatching urlresolvers.reverse luckily doesn't work! Might be doable with a custom URLResolvers subclass, but I recommend against it.
And yes, your middleware is not threadsafe. Middlewares are initialized only once and shared between threads, meaning that storing anything on self is not threadsafe.

Practical rules for Django MiddleWare ordering?

The official documentation is a bit messy: 'before' & 'after' are used for ordering MiddleWare in a tuple, but in some places 'before'&'after' refers to request-response phases. Also, 'should be first/last' are mixed and it's not clear which one to use as 'first'.
I do understand the difference.. however it seems to complicated for a newbie in Django.
Can you suggest some correct ordering for builtin MiddleWare classes (assuming we enable all of them) and — most importantly — explain WHY one goes before/after other ones?
here's the list, with the info from docs I managed to find:
UpdateCacheMiddleware
Before those that modify 'Vary:' SessionMiddleware, GZipMiddleware, LocaleMiddleware
GZipMiddleware
Before any MW that may change or use the response body
After UpdateCacheMiddleware: Modifies 'Vary:'
ConditionalGetMiddleware
Before CommonMiddleware: uses its 'Etag:' header when USE_ETAGS=True
SessionMiddleware
After UpdateCacheMiddleware: Modifies 'Vary:'
Before TransactionMiddleware: we don't need transactions here
LocaleMiddleware, One of the topmost, after SessionMiddleware, CacheMiddleware
After UpdateCacheMiddleware: Modifies 'Vary:'
After SessionMiddleware: uses session data
CommonMiddleware
Before any MW that may change the response (it calculates ETags)
After GZipMiddleware so it won't calculate an E-Tag on gzipped contents
Close to the top: it redirects when APPEND_SLASH or PREPEND_WWW
CsrfViewMiddleware
Before any view middleware that assumes that CSRF attacks have been dealt with
AuthenticationMiddleware
After SessionMiddleware: uses session storage
MessageMiddleware
After SessionMiddleware: can use Session-based storage
XViewMiddleware
TransactionMiddleware
After MWs that use DB: SessionMiddleware (configurable to use DB)
All *CacheMiddleWare is not affected (as an exception: uses own DB cursor)
FetchFromCacheMiddleware
After those those that modify 'Vary:' if uses them to pick a value for cache hash-key
After AuthenticationMiddleware so it's possible to use CACHE_MIDDLEWARE_ANONYMOUS_ONLY
FlatpageFallbackMiddleware
Bottom: last resort
Uses DB, however, is not a problem for TransactionMiddleware (yes?)
RedirectFallbackMiddleware
Bottom: last resort
Uses DB, however, is not a problem for TransactionMiddleware (yes?)
(I will add suggestions to this list to collect all of them in one place)
The most difficult part is that you have to consider both directions at the same time when setting the order. I would say that's a flaw in the design and I personally would opt for a separate request and response middleware order (so you wouldn't need hacks like FetchFromCacheMiddleware and UpdateCacheMiddleware).
But... alas, it's this way right now.
Either way, the idea of it all is that your request passes through the list of middlewares in top-down order for process_request and process_view. And it passes your response through process_response and process_exception in reverse order.
With UpdateCacheMiddleware this means that any middleware that changes the Vary headers in the HTTP request should come before it. If you change the order here than it would be possible for some user to get a cached page for some other user.
How can you find out if the Vary header is changed by a middleware? You can either hope that there are docs available, or simply look at the source. It's usually quite obvious :)
One tip that can save your hair is to put TransactionMiddleware in such place on the list, in which it isn't able to rollback changes commited to the database by other middlewares, which changes should be commited no matter if view raised an exception or not.

Disable caching for a view or url in django

In django, I wrote a view that simply returns a file, and now I am having problems because memcache is trying to cache that view, and in it's words, "TypeError: can't pickle file objects".
Since I actually do need to return files with this view (I've essentially made a file-based cache for this view), what I need to do is somehow make it so memcache can't or won't try to cache the view.
I figure this can be done in two ways. First, block the view from being cached (a decorator would make sense here), and second, block the URL from being cached.
Neither seems to be possible, and nobody else seems to have run into this problem, at least not on the public interwebs. Help?
Update: I've tried the #never_cache decorator, and even thought it was working, but while that sets the headers so other people won't cache things, my local machine still does.
from django.views.decorators.cache import never_cache
#never_cache
def myview(request):
# ...
Documentation is here...
Returning a real, actual file object from a view sounds like something is wrong. I can see returning the contents of a file, feeding those contents into an HttpResponse object. If I understand you correctly, you're caching the results of this view into a file. Something like this:
def myview(request):
file = open('somefile.txt','r')
return file # This isn't gonna work. You need to return an HttpRequest object.
I'm guessing that if you turned caching off entirely in settings.py, your "can't pickle a file object" would turn into a "view must return an http response object."
If I'm on the right track with what's going on, then here are a couple of ideas.
You mentioned you're making a file-based cache for this one view. You sure you want to do that instead of just using memcached?
If you really do want a file, then do something like:
def myview(request):
file = open('somefile.txt','r')
contents = file.read()
resp = HttpRespnse()
resp.write(contents)
file.close()
return resp
That will solve your "cannot pickle a file" problem.
You probably did a per site cache, but what you want to do now is a per view cache. The first one is easier to implement, but is only meant for the case of 'just caching everything'. Because you want to choose for every view now, just switch to the fine grained approach. It is also very easy to use, but remember that sometimes you need to create a second view with the same contents, if you want to have the result sometimes cached and sometimes not, depending on the url.
So far to the answer to your question. But is that an answer to your problem? Why do you return files in a view? Normally static files like videos, pictures, css, flash games or whatever should be handled by the server itself (or even by a different server). And I guess, that is what you want to do in that view. Is that correct? The reason for not letting django do this is, because starting django and letting django do its thing also eats a lot of resoruces and time. You don't feel that, when you are the only user in your test environment. But when you want to scale to some thousand users or more, then this kind of stuff becomes very nasty. Also from a logical point of view it does not seem smart, to let a program handle files without changing them, when the normal job of the program is to generate or change HTML according to a state of your data and a user-request. It's like letting your accountant do the programming work. While he might be able to do it, you probably want somebody else do it and let the accountant take care of your books.