How can I retrieve all keys from a flask cache? - flask

I'm debugging a flask app and want to see which values were stored in my simple cache. Is there a way to retrieve all keys? (The same way you might with a dictionary...
cache = Cache()
cache.init_app(app, config={"CACHE_TYPE": "simple"})
cache.set("item-1", "red")
cache.set("item-2", "blue")
# I would like to do the following:
# cache.keys()

Based on the source code for Flask-Caching (Don't use Flask-Cache cause it's very dated)...
There doesn't appear to be a built in method to get all the values without providing the keys, but for debugging you could do something like:
>>> for k in cache.cache._cache:
... print (k, cache.get(k))
...
item-1 red
item-2 blue
This appears to return a value of None for expired items:
item-1 None
cache.cache._cache is the dictionary with pickled values.
However you should also be aware that the 'simple' cache isn't really thread safe, as it only uses a dictionary for strorage. You should switch to a different backend like Redis for larger apps.

When using redis as the backend for flask_caching there is no method exposed to query all the keys.
But we can query the redis instance ourselves.
From the source
k_prefix = cache.cache.key_prefix
keys = cache.cache._write_client.keys(k_prefix + '*')
keys = [k.decode('utf8') for k in keys]
keys = [k.replace(k_prefix, '') for k in keys]
print(keys)
values = cache.get_many(*keys)
print(values)

Related

Django Sessions via Memcache: Cannot find session key manually

I recently migrated from database backed sessions to sessions stored via memcached using pylibmc.
Here is my CACHES, SESSION_CACHE_ALIAS & SESSION_ENGINE in my settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
'LOCATION': ['127.0.0.1:11211'],
}
}
SESSION_CACHE_ALIAS = 'default'
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
Everything is working fine behind the scenes and I can see that it is using the new caching system. Running the get_stats() method from pylibmc shows me the number of current items in the cache and I can see that it has gone up by 1.
The issue is I'm unable to grab the session manually using pylibmc.
Upon inspecting the request session data in views.py:
def my_view(request):
if request.user.is_authenticated():
print request.session.session_key
# the above prints something like this: "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
print request.session.cache_key
# the above prints something like this: "django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
return HttpResponse(status=200)
else:
return HttpResponse(status=401)
I noticed that when printing cache_key, it prints with the default KEY_PREFIX whereas for session_key it didn't. Take a look at the comments in the code to see what I mean.
So I figured, "Ok great, one of these key names should work. Let me try grabbing the session data manually just for educational purposes":
import pylibmc
mc = pylibmc.Client(['127.0.0.1:11211'])
# Let's try key "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
mc.get("1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
Hmm nothing happens, no key exists by that name. Ok no worries, let's try the cache_key then, that should definitely work right?
mc.get("django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
What? How am I still getting nothing back? As I test I decide to set and get a random key value to see if it works and it does. I run get_stats() again just to make sure that the key does exist. I also test the web app to see if indeed my session is working and it does. So this leads me to conclude that there is a different naming scheme that I'm unaware of.
If so, what is the correct naming scheme?
Yes, the cache key used internally by Django is, in general, different to the key sent to the cache backend (in this case pylibmc / memcached). Let us call these two keys the django cache key and the final cache key respectively.
The django cache key given by request.session.cache_key is for use with Django's low-level cache API, e.g.:
>>> from django.core.cache import cache
>>> cache.get(request.session.cache_key)
{'_auth_user_hash': '1ay2kcv7axb3nu5fwnwoyf85wkwsttz9', '_auth_user_id': u'1', '_auth_user_backend': u'django.contrib.auth.backends.ModelBackend'}
The final cache key on the other hand, is a composition of the key prefix, the django cache key, and the cache version number. The make_key function (from Django docs) below demonstrates how these three values are composed to generate this key:
def make_key(key, key_prefix, version):
return ':'.join([key_prefix, str(version), key])
By default, key_prefix is the empty string and version is 1.
Finally, by inspecting make_key we find that the correct final cache key to pass to mc.get is
:1:django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9
which has the form <KEY_PREFIX>:<VERSION>:<KEY>.
Note: the final cache key can be changed by defining KEY_FUNCTION in the cache settings.

Django Redis cache values

I have set the value to Redis server externally using python script.
r = redis.StrictRedis(host='localhost', port=6379, db=1)
r.set('foo', 'bar')
And tried to get the value from web request using django cache inside views.py.
from django.core.cache import cache
val = cache.get("foo")
It is returning None. But when I tries to get it form
from django_redis import get_redis_connection
con = get_redis_connection("default")
val = con.get("foo")
It is returning the correct value 'bar'. How cache and direct connections are working ?
Libraries usually use several internal prefixes to store keys in redis, in order not to be mistaken with user defined keys.
For example, django-redis-cache, prepends a ":1:" to every key you save into it.
So for example when you do r.set('foo', 'bar'), it sets the key to, ":1:foo". Since you don't know the prefix prepended to your key, you can't get the key using a normal get, you have to use it's own API to get.
r.set('foo', 'bar')
r.get('foo') # None
r.get(':1:foo') # bar
So in the end, it returns to the library you use, go read the code for it and see how it exactly saves the keys. redis-cli can be your valuable friend here. Basically set a key with cache.set('foo', 'bar'), and go into redis-cli and check with 'keys *' command to see what key was set for foo.

How to match redis key patterns using native django cache?

I have a series of caches which follow this pattern:
key_x_y = value
Like:
'key_1_3' = 'foo'
'key_2_5' = 'bar'
'key_1_7' = 'baz'
Now I'm wondering how can I iterate over all keys to match pattern like key_1_* to get foo and baz using the native django cache.get()?
(I know that there are way, particularly for redis, that allow using more extensive api like iterate, but I'd like to stick to vanilla django cache, if possible)
This is not possible using standard Django's cache wrapper. As the feature to search keys by pattern is a backend dependent operation and not supported by all the cache backends used by Django (e.g. memcached does not support it but Redis does). So you will have to use a custom cache wrapper with cache backend that supports this operation.
Edit:
If you are already using django-redis then you can do
from django.core.cache import cache
cache.keys("foo_*")
as explained here.
This will return list of keys matching the pattern then you can use cache.get_many() to get values for these keys.
cache.get_many(cache.keys("key_1_*"))
If the cache has following entries:
cache = {'key_1_3': 'foo', 'key_2_5': 'bar', 'key_1_7': 'baz'}
You can get all the entries which has key key_1_*:
x = {k: v for k, v in cache.items() if k.startswith('key_1')}
Based on the documentation from django-redis
You can list all the keys with a pattern:
>>> from django.core.cache import cache
>>> cache.keys("key_1_*")
# ["key_1_3", "key_1_7"]
once you have the keys you can get the values from this:
>>> [cache.get(k) for k in cache.keys("key_1_*")]
# ['foo', 'baz']
You can also use cache.iter_keys(pattern) for efficient implementation.
Or, as suggested by #Muhammad Tahir, you can use cache.get_many(cache.keys("key_1_*")) to get all the values in one go.
I saw several answers above mentioning django-redis.
Based on https://pypi.org/project/django-redis/
You can actually use delete_pattern() method
from django.core.cache import cache
cache.delete_pattern('key_1_*')

Hierarchical cache in Django

What I want to do is to mark some values in the cache as related so I could delete them at once. For example when I insert a new entry to the database I want to delete everything in the cache which was based on the old values in database.
I could always use cache.clear() but it seems too brutal to me. Or I could store related values together in the dictionary and cache this dictionary. Or I could maintain some kind of index in an extra field in cache. But everything seems to complicated to me (eventually slow?).
What you think? Is there any existing solution? Or is my approach wrong? Thanks for answers.
Are you using the cache api? It sounds like it.
This post, which pointed me to these slides helped me create a nice generational caching system which let me create the hierarchy I wanted.
In short, you store a generation key (such as group) in your cache and incorporate the value stored into your key creation function so that you can invalidate a whole set of keys at once.
With this basic concept you could create highly complex hierarchies or just a simple group system.
For example:
class Cache(object):
def generate_cache_key(self, key, group=None):
"""
Generate a cache key relating them via an outside source (group)
Generates key such as 'group-1:KEY-your-key-here'
Note: consider this pseudo code and definitely incomplete code.
"""
key_fragments = [('key', key)]
if group:
key_fragments.append((group, cache.get(group, '1')))
combined_key = ":".join(['%s-%s' % (name, value) for name, value in key_fragments)
hashed_key = md5(combined_key).hexdigest()
return hashed_key
def increment_group(self, group):
"""
Invalidate an entire group
"""
cache.incr(group)
def set(self, key, value, group=None):
key = self.generate_cache_key(key, group)
cache.set(key, value)
def get(self, key, group=None):
key = self.generate_cache_key(key, group)
return cache.get(key)
# example
>>> cache = Cache()
>>> cache.set('key', 'value', 'somehow_related')
>>> cache.set('key2', 'value2', 'somehow_related')
>>> cache.increment_group('somehow_related')
>>> cache.get('key') # both invalidated
>>> cache.get('key2') # both invalidated
Caching a dict or something serialised (with JSON or the like) sounds good to me. The cache backends are key-value stores like memcache, they aren't hierarchical.

Why does Django give me different results for the same query?

For a mock web service I wrote a little Django app, that serves as a web API, which my android application queries. When I make requests tp the API, I am also able to hand over an offset and limit to only have the really necessary data transmitted. Anyway, I ran into the problem, that Django gives me different results for the same query to the API. It seems as if the results are returned round robin.
This is the Django code that will be run:
def getMetaForCategory(request, offset, limit):
if request.method == "GET":
result = { "meta_information": [] }
categoryIDs = request.GET.getlist("category_ids[]")
categorySet = set(toInt(categoryIDs))
categories = Category.objects.filter(id__in = categoryIDs)
metaSet = set([])
for category in categories:
metaSet = metaSet | set(category.meta_information.all())
metaList = list(metaSet)
metaList.sort()
for meta in metaList[int(offset):int(limit)]:
relatedCategoryIDs = getIDs(meta.category_set.all())
item = {
"_id": meta.id,
"name": meta.name,
"type": meta.type,
"categories": list(categorySet & set(relatedCategoryIDs))
}
result['meta_information'].append(item)
return HttpResponse(content = simplejson.dumps(result), mimetype = "application/json")
else:
return HttpResponse(status = 403)
What happens is the following: If all MetaInformation objects would be Foo, Bar, Baz and Blib and I would set the limit to 0:2, then I would get [Foo, Bar] with the first request and with the exact same request the method would return [Baz, Blib] when I run it for the second time.
Does anyone see what I am doing wrong here? Or is it the Django cache that somehow gets into my way?
I think the difficulty is that you are using a set to store your objects, and slicing that - and sets have no ordering (they are like dictionaries in that way). So, the results from your query are in fact indeterminate.
There are various implementations of ordered sets around - you could look into using one of them. However, I must say that I think you are doing a lot of unnecessary and expensive unique-ifying and sorting in Python, when most of this could be done directly by the database. For instance, you seem to be trying to get the unique list of Metas that are related to the categories you pass. Well, this could be done in a single ORM query:
meta_list = MetaInformation.objects.filter(category__id__in=categoryIDs)
and you could then drop the set, looping and sorting commands.