How to invalidate cache_page in Django? - django

Here is the problem: I have blog app and I cache the post output view for 5 minutes.
#cache_page(60 * 5)
def article(request, slug):
...
However, I'd like to invalidate the cache whenever a new comment is added to the post.
I'm wondering how best to do so?
I've seen this related question, but it is outdated.

I would cache in a bit different way:
def article(request, slug):
cached_article = cache.get('article_%s' % slug)
if not cached_article:
cached_article = Article.objects.get(slug=slug)
cache.set('article_%s' % slug, cached_article, 60*5)
return render(request, 'article/detail.html', {'article':cached_article})
then saving the new comment to this article object:
# ...
# add the new comment to this article object, then
if cache.get('article_%s' % article.slug):
cache.delete('article_%s' % article.slug)
# ...

This was the first hit for me when searching for a solution, and the current answer wasn't terribly helpful, so after a lot of poking around Django's source, I have an answer for this one.
Yes you can know the key programmatically, but it takes a little work.
Django's page caching works by referencing the request object, specifically the request path and query string. This means that for every request to your page that has a different query string, you will have a different cache key. For most cases, this isn't likely to be a problem, since the page you want to cache/invalidate will be a known string like /blog/my-awesome-year, so to invalidate this, you just need to use Django's RequestFactory:
from django.core.cache import cache
from django.test import RequestFactory
from django.urls import reverse
from django.utils.cache import get_cache_key
cache.delete(get_cache_key(RequestFactory().get("/blog/my-awesome-year")))
If your URLs are a fixed list of values (ie. no differing query strings) then you can stop here. However if you've got lots of different query strings (say ?q=xyz for a search page or something), then your best bet is probably to create a separate cache for each view. Then you can just pass cache="cachename" to cache_page() and later clear that entire cache with:
from django.core.cache import caches
caches["my_cache_name"].clear()
Important note about this tactic
It only really works for unauthenticated pages. The minute your user is logged in, the cookie data is made part of the cache key creation process, and therefore re-creating that key programmatically becomes much harder. I suppose you could try pulling the cookie data out of your session store, but there could be thousands of keys in there, and you'd have to invalidate/pre-cache each and every one of them.

Related

flask how to keep database queries references up to date

I am creating a flask app with two panels one for the admin and the other is for users. In the app scheme I have a utilities file where I keep most of the redundant variables besides other functions, (by redundant i mean i use it in many different parts of the application)
utilities.py
# ...
opening_hour = db_session.query(Table.column).one()[0] # 10:00 AM
# ...
The Table.column or let's say the opening_hour variable's value above is entered to the database by the admin though his/her web panel. This value limits the users from accessing certain functionalities of the application before the specified hour.
The problem is:
If the admin changes that value through his/her web panel, let's say to 11:00 AM. the changes is not being shown directly in the users panel."even though it was entered to the database!".
If I want the new opening_hour's value to take control. I have to manually shutdown the app and restart it "sometimes even this doesn't work"
I have tried adding gc.collect()...did nothing. There must be a way around this other than shutting and restarting the app manually. first, I doubt the admin will be able to do that. second, even if he/she can, that would be really frustrating.
If someone can relate to this please explain why is this occurring and how to get around it. Thanks in advance :)
You are trying to add advanced logic to a simple variable: You want to query the DB only once, and periodically force the variable to update by re-loading the module. That's not how modules and the import mechanism is supposed to be used.
If you want to access a possibly changing value from the database, you have to read it over and over again.
The solution is to, instead of a variable, define a function opening_hours that executes the DB query every time you check the value
def opening_hours():
return (
db_session.query(Table.column).one()[0], # 10:00 AM
db_session.query(Table.column).one()[1] # 5:00 PM
)
Now you may not want to have to query the Database every time you check the value, but maybe cache it for a few minutes. The easiest would be to use cachetools for that:
import cachetools
cache = cachetools.TTLCache(maxsize=10, ttl=60) # Cache for 60 seconds
#cachetools.cached(cache)
def opening_hours():
return (
db_session.query(Table.column).one()[0], # 10:00 AM
db_session.query(Table.column).one()[1] # 5:00 PM
)
Also, since you are using Flask, you can create a route decorator that controls access to your views depending on the view of the day
from datetime import datetime, time
from functools import wraps
from flask import g, request, render_template
def only_within_office_hours(f):
#wraps(f)
def decorated_function(*args, **kwargs):
start_time, stop_time = opening_hour()
if start_time <= datetime.now().time() <= stop_time:
return render_template('office_hours_error.html')
return f(*args, **kwargs)
return decorated_function
that you can use like
#app.route('/secret_page')
#login_required
#only_within_office_hours
def secret_page():
pass

Django Sessions via Memcache: Cannot find session key manually

I recently migrated from database backed sessions to sessions stored via memcached using pylibmc.
Here is my CACHES, SESSION_CACHE_ALIAS & SESSION_ENGINE in my settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
'LOCATION': ['127.0.0.1:11211'],
}
}
SESSION_CACHE_ALIAS = 'default'
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
Everything is working fine behind the scenes and I can see that it is using the new caching system. Running the get_stats() method from pylibmc shows me the number of current items in the cache and I can see that it has gone up by 1.
The issue is I'm unable to grab the session manually using pylibmc.
Upon inspecting the request session data in views.py:
def my_view(request):
if request.user.is_authenticated():
print request.session.session_key
# the above prints something like this: "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
print request.session.cache_key
# the above prints something like this: "django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
return HttpResponse(status=200)
else:
return HttpResponse(status=401)
I noticed that when printing cache_key, it prints with the default KEY_PREFIX whereas for session_key it didn't. Take a look at the comments in the code to see what I mean.
So I figured, "Ok great, one of these key names should work. Let me try grabbing the session data manually just for educational purposes":
import pylibmc
mc = pylibmc.Client(['127.0.0.1:11211'])
# Let's try key "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
mc.get("1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
Hmm nothing happens, no key exists by that name. Ok no worries, let's try the cache_key then, that should definitely work right?
mc.get("django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
What? How am I still getting nothing back? As I test I decide to set and get a random key value to see if it works and it does. I run get_stats() again just to make sure that the key does exist. I also test the web app to see if indeed my session is working and it does. So this leads me to conclude that there is a different naming scheme that I'm unaware of.
If so, what is the correct naming scheme?
Yes, the cache key used internally by Django is, in general, different to the key sent to the cache backend (in this case pylibmc / memcached). Let us call these two keys the django cache key and the final cache key respectively.
The django cache key given by request.session.cache_key is for use with Django's low-level cache API, e.g.:
>>> from django.core.cache import cache
>>> cache.get(request.session.cache_key)
{'_auth_user_hash': '1ay2kcv7axb3nu5fwnwoyf85wkwsttz9', '_auth_user_id': u'1', '_auth_user_backend': u'django.contrib.auth.backends.ModelBackend'}
The final cache key on the other hand, is a composition of the key prefix, the django cache key, and the cache version number. The make_key function (from Django docs) below demonstrates how these three values are composed to generate this key:
def make_key(key, key_prefix, version):
return ':'.join([key_prefix, str(version), key])
By default, key_prefix is the empty string and version is 1.
Finally, by inspecting make_key we find that the correct final cache key to pass to mc.get is
:1:django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9
which has the form <KEY_PREFIX>:<VERSION>:<KEY>.
Note: the final cache key can be changed by defining KEY_FUNCTION in the cache settings.

Django caching a large list

My django application deals with 25MB binary files. Each of them has about 100,000 "records" of 256 bytes each.
It takes me about 7 seconds to read the binary file from disk and decode it using python's struct module. I turn the data into a list of about 100,000 items, where each item is a dictionary with values of various types (float, string, etc.).
My django views need to search through this list. Clearly 7 seconds is too long.
I've tried using django's low-level caching API to cache the whole list, but that won't work because there's a maximum size limit of 1MB for any single cached item. I've tried caching the 100,000 list items individually, but that takes a lot more than 7 seconds - most of the time is spent unpickling the items.
Is there a convenient way to store a large list in memory between requests? Can you think of another way to cache the object for use by my django app?
edit the item size limit to be 10m (larger than 1m), add
-I 10m
to /etc/memcached.conf and restart memcached
also edit this class in memcached.py located in /usr/lib/python2.7/dist-packages/django/core/cache/backends to look like this:
class MemcachedCache(BaseMemcachedCache):
"An implementation of a cache binding using python-memcached"
def __init__(self, server, params):
import memcache
memcache.SERVER_MAX_VALUE_LENGTH = 1024*1024*10 #added limit to accept 10mb
super(MemcachedCache, self).__init__(server, params,
library=memcache,
value_not_found_exception=ValueError)
I'm not able to add comments yet, but I wanted to share my quick fix around this problem, since I had the same problem with python-memcached behaving strangely when you change the SERVER_MAX_VALUE_LENGTH at import time.
Well, besides the __init__ edit that FizxMike suggests you can also edit the _cache property in the same class. Doing so you can instantiate the python-memcached Client passing the server_max_value_length explicitly, like this:
from django.core.cache.backends.memcached import BaseMemcachedCache
DEFAULT_MAX_VALUE_LENGTH = 1024 * 1024
class MemcachedCache(BaseMemcachedCache):
def __init__(self, server, params):
#options from the settings['CACHE'][connection]
self._options = params.get("OPTIONS", {})
import memcache
memcache.SERVER_MAX_VALUE_LENGTH = self._options.get('SERVER_MAX_VALUE_LENGTH', DEFAULT_MAX_VALUE_LENGTH)
super(MemcachedCache, self).__init__(server, params,
library=memcache,
value_not_found_exception=ValueError)
#property
def _cache(self):
if getattr(self, '_client', None) is None:
server_max_value_length = self._options.get("SERVER_MAX_VALUE_LENGTH", DEFAULT_MAX_VALUE_LENGTH)
#one could optionally send more parameters here through the options settings,
#I simplified here for brevity
self._client = self._lib.Client(self._servers,
server_max_value_length=server_max_value_length)
return self._client
I also prefer to create another backend that inherits from BaseMemcachedCache and use it instead of editing django code.
here's the django memcached backend module for reference:
https://github.com/django/django/blob/master/django/core/cache/backends/memcached.py
Thanks for all the help on this thread!

Django: Passing data to view from url dispatcher without including the data in the url?

I've got my mind set on dynamically creating URLs in Django, based on names stored in database objects. All of these pages should be handled by the same view, but I would like the database object to be passed to the view as a parameter when it is called. Is that possible?
Here is the code I currently have:
places = models.Place.objects.all()
for place in places:
name = place.name.lower()
urlpatterns += patterns('',
url(r'^'+name +'/$', 'misc.views.home', name='places.'+name)
)
Is it possible to pass extra information to the view, without adding more parameters to the URL? Since the URLs are for the root directory, and I still need 404 pages to show on other values, I can't just use a string parameter. Is the solution to give up on trying to add the URLs to root, or is there another solution?
I suppose I could do a lookup on the name itself, since all URLs have to be unique anyway. Is that the only other option?
I think you can pass a dictionary to the view with additional attributes, like this:
url(r'^'+name +'/$', 'misc.views.home', {'place' : place}, name='places.'+name)
And you can change the view to expect this parameter.
That's generally a bad idea since it will query the database for every request, not only requests relevant to that model. A better idea is to come up with the general url composition and use the same view for all of them. You can then retrieve the relevant place inside the view, which will only hit the database when you reach that specific view.
For example:
urlpatterns += patterns('',
url(r'^places/(?P<name>\w+)/$', 'misc.views.home', name='places.view_place')
)
# views.py
def home(request, name):
place = models.Place.objects.get(name__iexact=name)
# Do more stuff here
I realize this is not what you truly asked for, but should provide you with much less headaches.

How to generate temporary URLs in Django

Wondering if there is a good way to generate temporary URLs that expire in X days. Would like to email out a URL that the recipient can click to access a part of the site that then is inaccessible via that URL after some time period. No idea how to do this, with Django, or Python, or otherwise.
If you don't expect to get a large response rate, then you should try to store all of the data in the URL itself. This way, you don't need to store anything in the database, and will have data storage proportional to the responses rather than the emails sent.
Updated: Let's say you had two strings that were unique for each user. You can pack them and unpack them with a protecting hash like this:
import hashlib, zlib
import cPickle as pickle
import urllib
my_secret = "michnorts"
def encode_data(data):
"""Turn `data` into a hash and an encoded string, suitable for use with `decode_data`."""
text = zlib.compress(pickle.dumps(data, 0)).encode('base64').replace('\n', '')
m = hashlib.md5(my_secret + text).hexdigest()[:12]
return m, text
def decode_data(hash, enc):
"""The inverse of `encode_data`."""
text = urllib.unquote(enc)
m = hashlib.md5(my_secret + text).hexdigest()[:12]
if m != hash:
raise Exception("Bad hash!")
data = pickle.loads(zlib.decompress(text.decode('base64')))
return data
hash, enc = encode_data(['Hello', 'Goodbye'])
print hash, enc
print decode_data(hash, enc)
This produces:
849e77ae1b3c eJzTyCkw5ApW90jNyclX5yow4koMVnfPz09JqkwFco25EvUAqXwJnA==
['Hello', 'Goodbye']
In your email, include a URL that has both the hash and enc values (properly url-quoted). In your view function, use those two values with decode_data to retrieve the original data.
The zlib.compress may not be that helpful, depending on your data, you can experiment to see what works best for you.
You could set this up with URLs like:
http://yoursite.com/temp/1a5h21j32
Your URLconf would look something like this:
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^temp/(?P<hash>\w+)/$', 'yoursite.views.tempurl'),
)
...where tempurl is a view handler that fetches the appropriate page based on the hash. Or, sends a 404 if the page is expired.
models
class TempUrl(models.Model):
url_hash = models.CharField("Url", blank=False, max_length=32, unique=True)
expires = models.DateTimeField("Expires")
views
def generate_url(request):
# do actions that result creating the object and mailing it
def load_url(request, hash):
url = get_object_or_404(TempUrl, url_hash=hash, expires__gte=datetime.now())
data = get_some_data_or_whatever()
return render_to_response('some_template.html', {'data':data},
context_instance=RequestContext(request))
urls
urlpatterns = patterns('', url(r'^temp/(?P<hash>\w+)/$', 'your.views.load_url', name="url"),)
//of course you need some imports and templates
It depends on what you want to do - one-shot things like account activation or allowing a file to be downloaded could be done with a view which looks up a hash, checks a timestamp and performs an action or provides a file.
More complex stuff such as providing arbitrary data would also require the model containing some reference to that data so that you can decide what to send back. Finally, allowing access to multiple pages would probably involve setting something in the user's session and then using that to determine what they can see, followed by a redirect.
If you could provide more detail about what you're trying to do and how well you know Django, I can make a more specific reply.
I think the solution lies within a combination of all the suggested solutions. I'd suggest using an expiring session so the link will expire within the time period you specify in the model. Combined with a redirect and middleware to check if a session attribute exists and the requested url requires it you can create somewhat secure parts of your site that can have nicer URLs that reference permanent parts of the site. I use this for demonstrating design/features for a limited time. This works to prevent forwarding... I don't do it but you could remove the temp url after first click so only the session attribute will provide access thus more effectively limiting to one user. I personally don't mind if the temp url gets forwarded knowing it will only last for a certain amount of time. Works well in a modified form for tracking invited visits as well.
It might be overkill, but you could use a uuidfield on your model and set up a Celerybeat task to change the uuid at any time interval you choose.
If celery is too much and it might be, you could just store the time the URL is first sent, use the timedelta function whenever it is sent thereafter, and if the elapsed time is greater than what you want just use a redirect. I think the second solution is very straightforward and it would extend easily. It would be a matter of having a model with the URL, time first sent, time most recently sent, a disabled flag, and a Delta that you find acceptable for the URL to live.
A temporary url can also be created by combining the ideas from #ned-batchelder's answer and #matt-howell's answer with Django's signing module.
The signing module provides a convenient way to encode data in the url, if necessary, and to check for link expiration. This way we don't need to touch the database or session/cache.
Here's a minimal example with an index page and a temp page:
The index page has a link to a temporary url, with the specified expiration. If you try to follow the link after expiration, you'll get a status 400 "Bad Request" (or you'll see the SuspiciousOperation error, if DEBUG is True).
urls.py
...
urlpatterns = [
path('', views.index, name='index'),
path('<str:signed_data>/', views.temp, name='temp'),
]
views.py
from django.core import signing
from django.core.exceptions import SuspiciousOperation
from django.http import HttpResponse
from django.urls import reverse
MAX_AGE_SECONDS = 20 # short expiration, for illustrative purposes
def generate_temp_url(data=None):
# signing.dumps() returns a "URL-safe, signed base64 compressed JSON string"
# with a timestamp
return reverse('temp', args=[signing.dumps(data)])
def index(request):
# just a convenient usage example
return HttpResponse(f'temporary link')
def temp(request, signed_data):
try:
# load data and check expiration
data = signing.loads(signed_data, max_age=MAX_AGE_SECONDS)
except signing.BadSignature:
# triggers an HttpResponseBadRequest (status 400) when DEBUG is False
raise SuspiciousOperation('invalid signature')
# success
return HttpResponse(f'Here\'s your data: {data}')
Some notes:
The responses in the example are very rudimentary, and only for illustrative purposes.
Raising a SuspiciousOperation is convenient, but you could e.g. return an HttpResponseNotFound (status 404) instead.
The generate_temp_url() returns a relative path. If you need an absolute url, you can do something like:
temp_url = request.build_absolute_uri(generate_temp_url())
If you're worried about leaking the signed data, have a look at alternatives such as Django's password reset implementation.