I used to cache a database query in a global variable to speed up my application. Since this is strongly unadvised (and it did generate problems), I want to use any kind of Django cache instead. I tried LocMemCache and DatabaseCache, but both take... about 15 seconds to set my variable (twice longer than it take to generate the data, which is 7MB in size).
Is that expected ? Am I doing something wrong ?
(Memcached is limited to 1MB, and I cannot split my data, which consists in arbitrarily big binary masks).
Edit: FileBasedCache takes 30s to set as well.
Settings.py:
CACHES = {
'default': {...},
'stats': {
'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
# or 'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'stats',
},
}
Service.py:
from django.core.cache import caches
def stats_service():
stats_cache = caches['stats']
if stats_cache.get('key') is None:
stats_cache.set('key', data) # 15s with DatabaseCache, 30s with LocMemCache
return stats_cache.get('key')
Global variable (super fast) version:
_cache = {}
def stats_service():
if _cache.get('key') is None:
_cache['key'] = data
return _cache['key']
One option may be to use diskcache.DjangoCache. DiskCache extends the Django cache API to support writing and reading binary streams as-is (avoid pickling). It works particularly well for large values (like those greater than 1MB). DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django.
In your case, you could use ndarray tostring and numpy fromstring methods to quickly convert to/from a Python string. Then wrap the string with io.StringIO to store/retrieve in the cache. For example:
from django.core.cache import cache
value = cache.get('cache-key', read=True)
if value:
data = numpy.fromstring(value.read())
value.close()
else:
data = ... # Generate 7MB array.
cachge.set('cache-key', io.StringIO(data.tostring()), read=True)
DiskCache extends the Django cache API by permitting file-like values which are stored as binary blobs on disk. The Django cache benchmarks page has a discussion and comparison of alternative cache backends.
This snippet actually works fine: https://djangosnippets.org/snippets/2396/
As I understood, the only problem with using global variables for caching is thread safety, and this no-pickle version is thread-safe.
Related
Background
I have a website running Django 2.0 on AWS ElasticBeanstalk. I have a couple views on my website that take some time to calculate, so I thought I'd look into some simple caching. I decided on LocMemCache because it looked like the quickest to set up that would meet my needs. (I'm using AWS, so using Memcached apparently requires ElastiCache, which adds cost and is additional setup overhead that I wanted to avoid.)
The views do not change often, and the site is not high-traffic, so I put long timeouts on the caches. There are three views where I have enabled caching:
A report generated inside a template – uses Template Fragment caching
A list of locations requested by AJAX and used in a JS library – uses per-view caching
A dynamically-generated binary file download – uses per-view caching
The caching is set up and works great.
The data that goes into these views is added and edited by other staff at my company, that are used to their changes appearing immediately. So in order to address questions such as, "I updated this data, why has the webpage not updated?" I wanted to create a "Clear Server Cache" button, accessible by staff, to force a cache reset.
The button is set up and functioning. It requests a view that calls cache.clear() from django.core.cache. I used the sledgehammer cache.clear() approach because the way to specify an individual per-view cache in code seems to be a bit clunky and convoluted, so the "clear it all" approach seemed adequate. And at the very least it should always "work" in the sense that all the data will get re-loaded again.
The Problem
When I use the button to call cache.clear(), it only clears the Template Fragment cache. It does not seem to clear the per-view caches. Why?
According to Django Documentation,
Be careful with this; clear() will remove everything from the cache, not just the keys set by your application.
So why is it not touching the per-view caches? Doesn't the warning seem to indicate that clear() is dangerous specifically because it's a sledgehammer and nothing at all is spared? What am I missing?
Does AWS use some kind of special memory that's immune to this sort of culling? (If this is the case, then why are the Template Fragments successfully cleared?) I did notice (and find it interesting) that the cache remains even after deploying a new image to the same environment.
I could switch to using Database caching, but I'd like to understand why this isn't working so I don't need to abandon LocMemCache as an option to ever use in the future.
I could also move the others to use Template Fragment caching, but if I ever expand the caching to fit other needs, I will want to be able to use per-view caching. Also, this solution would be less than ideal for a binary-file-download view.
settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
## 'LOCATION': '',
},
}
portfolio.html (Cache #1 – Template Fragment)
{% load static cache compress %}
...
<div id="total-portfolio-content">
{% cache 7200 portfolio %}{% include 'reports/total_portfolio/report_include.html' %}{% endcache %}
</div>
map/urls.py (Cache #2 – Per-view)
from django.conf.urls import include, url
app_name = 'map'
urlpatterns = [
## Yes, I know this uses the old-style url(). I have plans to upgrade the entire project.
url(r'^(?P<tg>[\w-]+)/data.geojson$',
cache_page(60 * 60 * 12)(
views.NamedGeoJSONLayerView.as_view(model=FacilityCoord)),
name='tg-data'),
]
resources/urls/__init__.py (Cache #3 – Per-view)
from django.conf.urls import include, url
app_name = 'resources'
urlpatterns = [
url(r'^download/$',
cache_page(60 * 60 * 12)(
views.DownloadMetricXLSX.as_view()),
name='download'),
]
myadmin/views.py (Cache Clear button)
from django.core.cache import cache
#staff_member_required(login_url=login_url)
def clear_cache(request):
cache.clear()
## And because that doesn't seem to work as advertised, I also tried....
## taken from <https://djangosnippets.org/snippets/1080/>
try:
cache._cache.clear() # in-memory caching
cache._expire_info.clear()
except AttributeError:
# I think this only applies to filesystem caching? Just grasping at straws.
old_freq = cache._cull_frequency
old_max = cache._max_entries
cache._max_entries = 0
cache._cull_frequency = 1
cache._cull()
cache._cull_frequency = old_freq
cache._max_entries = old_max
return JsonResponse({'success': True})
The problem is in the response headers. The cache_page decorator automatically adds a max-age option to the Cache-Control header in the response. So the cache clear was working properly, clearing the local memory on the server, but the user's browser was instructed not to ask the server for updated data for the duration of the timeout. And my browser was happily complying (even after Ctrl-F5).
Fortunately, there are other decorators you can use to deal with this without much difficulty, now that it's clear what's happening. Django provides a number of other decorators, such as cache_control or never_cache.
I ended up using never_cache, which turned the urls files into...
from django.conf.urls import include, url
from django.views.decorators.cache import never_cache, cache_page
app_name = 'map'
urlpatterns = [
url(r'^(?P<tg>[\w-]+)/data.geojson$',
never_cache(cache_page(60 * 60 * 12)(
views.NamedGeoJSONLayerView.as_view(model=FacilityCoord))),
name='tg-data'),
]
and
from django.conf.urls import include, url
from django.views.decorators.cache import never_cache, cache_page
app_name = 'resources'
urlpatterns = [
url(r'^download/$',
never_cache(cache_page(60 * 60 * 12)(
views.DownloadMetricXLSX.as_view())),
name='download'),
]
I recently migrated from database backed sessions to sessions stored via memcached using pylibmc.
Here is my CACHES, SESSION_CACHE_ALIAS & SESSION_ENGINE in my settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
'LOCATION': ['127.0.0.1:11211'],
}
}
SESSION_CACHE_ALIAS = 'default'
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
Everything is working fine behind the scenes and I can see that it is using the new caching system. Running the get_stats() method from pylibmc shows me the number of current items in the cache and I can see that it has gone up by 1.
The issue is I'm unable to grab the session manually using pylibmc.
Upon inspecting the request session data in views.py:
def my_view(request):
if request.user.is_authenticated():
print request.session.session_key
# the above prints something like this: "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
print request.session.cache_key
# the above prints something like this: "django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
return HttpResponse(status=200)
else:
return HttpResponse(status=401)
I noticed that when printing cache_key, it prints with the default KEY_PREFIX whereas for session_key it didn't. Take a look at the comments in the code to see what I mean.
So I figured, "Ok great, one of these key names should work. Let me try grabbing the session data manually just for educational purposes":
import pylibmc
mc = pylibmc.Client(['127.0.0.1:11211'])
# Let's try key "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
mc.get("1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
Hmm nothing happens, no key exists by that name. Ok no worries, let's try the cache_key then, that should definitely work right?
mc.get("django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
What? How am I still getting nothing back? As I test I decide to set and get a random key value to see if it works and it does. I run get_stats() again just to make sure that the key does exist. I also test the web app to see if indeed my session is working and it does. So this leads me to conclude that there is a different naming scheme that I'm unaware of.
If so, what is the correct naming scheme?
Yes, the cache key used internally by Django is, in general, different to the key sent to the cache backend (in this case pylibmc / memcached). Let us call these two keys the django cache key and the final cache key respectively.
The django cache key given by request.session.cache_key is for use with Django's low-level cache API, e.g.:
>>> from django.core.cache import cache
>>> cache.get(request.session.cache_key)
{'_auth_user_hash': '1ay2kcv7axb3nu5fwnwoyf85wkwsttz9', '_auth_user_id': u'1', '_auth_user_backend': u'django.contrib.auth.backends.ModelBackend'}
The final cache key on the other hand, is a composition of the key prefix, the django cache key, and the cache version number. The make_key function (from Django docs) below demonstrates how these three values are composed to generate this key:
def make_key(key, key_prefix, version):
return ':'.join([key_prefix, str(version), key])
By default, key_prefix is the empty string and version is 1.
Finally, by inspecting make_key we find that the correct final cache key to pass to mc.get is
:1:django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9
which has the form <KEY_PREFIX>:<VERSION>:<KEY>.
Note: the final cache key can be changed by defining KEY_FUNCTION in the cache settings.
I have a series of caches which follow this pattern:
key_x_y = value
Like:
'key_1_3' = 'foo'
'key_2_5' = 'bar'
'key_1_7' = 'baz'
Now I'm wondering how can I iterate over all keys to match pattern like key_1_* to get foo and baz using the native django cache.get()?
(I know that there are way, particularly for redis, that allow using more extensive api like iterate, but I'd like to stick to vanilla django cache, if possible)
This is not possible using standard Django's cache wrapper. As the feature to search keys by pattern is a backend dependent operation and not supported by all the cache backends used by Django (e.g. memcached does not support it but Redis does). So you will have to use a custom cache wrapper with cache backend that supports this operation.
Edit:
If you are already using django-redis then you can do
from django.core.cache import cache
cache.keys("foo_*")
as explained here.
This will return list of keys matching the pattern then you can use cache.get_many() to get values for these keys.
cache.get_many(cache.keys("key_1_*"))
If the cache has following entries:
cache = {'key_1_3': 'foo', 'key_2_5': 'bar', 'key_1_7': 'baz'}
You can get all the entries which has key key_1_*:
x = {k: v for k, v in cache.items() if k.startswith('key_1')}
Based on the documentation from django-redis
You can list all the keys with a pattern:
>>> from django.core.cache import cache
>>> cache.keys("key_1_*")
# ["key_1_3", "key_1_7"]
once you have the keys you can get the values from this:
>>> [cache.get(k) for k in cache.keys("key_1_*")]
# ['foo', 'baz']
You can also use cache.iter_keys(pattern) for efficient implementation.
Or, as suggested by #Muhammad Tahir, you can use cache.get_many(cache.keys("key_1_*")) to get all the values in one go.
I saw several answers above mentioning django-redis.
Based on https://pypi.org/project/django-redis/
You can actually use delete_pattern() method
from django.core.cache import cache
cache.delete_pattern('key_1_*')
I understand that Django's cache functions expire after a specified time interval has elapsed (e.g. 1 minute, 1 hour, etc.), but I have some content that changes on a daily basis (e.g. "message of the day"). Ideally this would be cached for 24 hours, but if I set the timeout to 24 hours there's no guarantee that the cache will expire precisely at midnight. What is the best practice for handling this situation?
Two easy options spring to mind, both involving a scheduled task that needs to run at (say) midnight.
1) Get ahead of the game: Schedule some code to run (eg a custom management command) that asks for your 'message of the day' content at midnight, with an 24hr expiry. (This assumes the relevant cache key is not set yet)
2) Go nuclear: schedule a cache purge at midnight
or, combining the two:
Don't go nuclear, just schedule a call to only delete the MOTD key (eg cache.delete('motd_key') at midnight, then cache the new one instead.
Alternatively, if you use Redis as your cache backend, you could cache the MOTD, then make an EXPIREAT call to set that cached MOTD entry to expire at 23:59:59. redis.py will let you do that in a Pythonic way.
If you're using Memcached as your backend, you don't get that level of control.
(And if you're using locmem://, you're Doing It Wrong ;o) )
Why not just implement a custom cache instead of introducing another side effect like scheduled jobs?
Create a cache class like so:
from datetime import datetime, timedelta
from django.core.cache.backends.locmem import LocMemCache
class MidnightCacher(LocMemCache):
def __init__(self, name, params):
super().__init__(name, params)
def get_backend_timeout(self, timeout=None):
# return time until midnight
return (datetime.utcnow() + timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0).timestamp()
Configure your cache in settings.py
CACHES = {
'midnight': {
'BACKEND': 'backend.midnight_cache.MidnightCacher',
'LOCATION': 'unique-snowflake',
}
}
And finally, decorate your view:
#cache_page(1, cache="midnight")
def get_comething(request):
pass
My django application deals with 25MB binary files. Each of them has about 100,000 "records" of 256 bytes each.
It takes me about 7 seconds to read the binary file from disk and decode it using python's struct module. I turn the data into a list of about 100,000 items, where each item is a dictionary with values of various types (float, string, etc.).
My django views need to search through this list. Clearly 7 seconds is too long.
I've tried using django's low-level caching API to cache the whole list, but that won't work because there's a maximum size limit of 1MB for any single cached item. I've tried caching the 100,000 list items individually, but that takes a lot more than 7 seconds - most of the time is spent unpickling the items.
Is there a convenient way to store a large list in memory between requests? Can you think of another way to cache the object for use by my django app?
edit the item size limit to be 10m (larger than 1m), add
-I 10m
to /etc/memcached.conf and restart memcached
also edit this class in memcached.py located in /usr/lib/python2.7/dist-packages/django/core/cache/backends to look like this:
class MemcachedCache(BaseMemcachedCache):
"An implementation of a cache binding using python-memcached"
def __init__(self, server, params):
import memcache
memcache.SERVER_MAX_VALUE_LENGTH = 1024*1024*10 #added limit to accept 10mb
super(MemcachedCache, self).__init__(server, params,
library=memcache,
value_not_found_exception=ValueError)
I'm not able to add comments yet, but I wanted to share my quick fix around this problem, since I had the same problem with python-memcached behaving strangely when you change the SERVER_MAX_VALUE_LENGTH at import time.
Well, besides the __init__ edit that FizxMike suggests you can also edit the _cache property in the same class. Doing so you can instantiate the python-memcached Client passing the server_max_value_length explicitly, like this:
from django.core.cache.backends.memcached import BaseMemcachedCache
DEFAULT_MAX_VALUE_LENGTH = 1024 * 1024
class MemcachedCache(BaseMemcachedCache):
def __init__(self, server, params):
#options from the settings['CACHE'][connection]
self._options = params.get("OPTIONS", {})
import memcache
memcache.SERVER_MAX_VALUE_LENGTH = self._options.get('SERVER_MAX_VALUE_LENGTH', DEFAULT_MAX_VALUE_LENGTH)
super(MemcachedCache, self).__init__(server, params,
library=memcache,
value_not_found_exception=ValueError)
#property
def _cache(self):
if getattr(self, '_client', None) is None:
server_max_value_length = self._options.get("SERVER_MAX_VALUE_LENGTH", DEFAULT_MAX_VALUE_LENGTH)
#one could optionally send more parameters here through the options settings,
#I simplified here for brevity
self._client = self._lib.Client(self._servers,
server_max_value_length=server_max_value_length)
return self._client
I also prefer to create another backend that inherits from BaseMemcachedCache and use it instead of editing django code.
here's the django memcached backend module for reference:
https://github.com/django/django/blob/master/django/core/cache/backends/memcached.py
Thanks for all the help on this thread!