How to match redis key patterns using native django cache? - django

I have a series of caches which follow this pattern:
key_x_y = value
Like:
'key_1_3' = 'foo'
'key_2_5' = 'bar'
'key_1_7' = 'baz'
Now I'm wondering how can I iterate over all keys to match pattern like key_1_* to get foo and baz using the native django cache.get()?
(I know that there are way, particularly for redis, that allow using more extensive api like iterate, but I'd like to stick to vanilla django cache, if possible)

This is not possible using standard Django's cache wrapper. As the feature to search keys by pattern is a backend dependent operation and not supported by all the cache backends used by Django (e.g. memcached does not support it but Redis does). So you will have to use a custom cache wrapper with cache backend that supports this operation.
Edit:
If you are already using django-redis then you can do
from django.core.cache import cache
cache.keys("foo_*")
as explained here.
This will return list of keys matching the pattern then you can use cache.get_many() to get values for these keys.
cache.get_many(cache.keys("key_1_*"))

If the cache has following entries:
cache = {'key_1_3': 'foo', 'key_2_5': 'bar', 'key_1_7': 'baz'}
You can get all the entries which has key key_1_*:
x = {k: v for k, v in cache.items() if k.startswith('key_1')}
Based on the documentation from django-redis
You can list all the keys with a pattern:
>>> from django.core.cache import cache
>>> cache.keys("key_1_*")
# ["key_1_3", "key_1_7"]
once you have the keys you can get the values from this:
>>> [cache.get(k) for k in cache.keys("key_1_*")]
# ['foo', 'baz']
You can also use cache.iter_keys(pattern) for efficient implementation.
Or, as suggested by #Muhammad Tahir, you can use cache.get_many(cache.keys("key_1_*")) to get all the values in one go.

I saw several answers above mentioning django-redis.
Based on https://pypi.org/project/django-redis/
You can actually use delete_pattern() method
from django.core.cache import cache
cache.delete_pattern('key_1_*')

Related

How can I retrieve all keys from a flask cache?

I'm debugging a flask app and want to see which values were stored in my simple cache. Is there a way to retrieve all keys? (The same way you might with a dictionary...
cache = Cache()
cache.init_app(app, config={"CACHE_TYPE": "simple"})
cache.set("item-1", "red")
cache.set("item-2", "blue")
# I would like to do the following:
# cache.keys()
Based on the source code for Flask-Caching (Don't use Flask-Cache cause it's very dated)...
There doesn't appear to be a built in method to get all the values without providing the keys, but for debugging you could do something like:
>>> for k in cache.cache._cache:
... print (k, cache.get(k))
...
item-1 red
item-2 blue
This appears to return a value of None for expired items:
item-1 None
cache.cache._cache is the dictionary with pickled values.
However you should also be aware that the 'simple' cache isn't really thread safe, as it only uses a dictionary for strorage. You should switch to a different backend like Redis for larger apps.
When using redis as the backend for flask_caching there is no method exposed to query all the keys.
But we can query the redis instance ourselves.
From the source
k_prefix = cache.cache.key_prefix
keys = cache.cache._write_client.keys(k_prefix + '*')
keys = [k.decode('utf8') for k in keys]
keys = [k.replace(k_prefix, '') for k in keys]
print(keys)
values = cache.get_many(*keys)
print(values)

Django Sessions via Memcache: Cannot find session key manually

I recently migrated from database backed sessions to sessions stored via memcached using pylibmc.
Here is my CACHES, SESSION_CACHE_ALIAS & SESSION_ENGINE in my settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.PyLibMCCache',
'LOCATION': ['127.0.0.1:11211'],
}
}
SESSION_CACHE_ALIAS = 'default'
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
Everything is working fine behind the scenes and I can see that it is using the new caching system. Running the get_stats() method from pylibmc shows me the number of current items in the cache and I can see that it has gone up by 1.
The issue is I'm unable to grab the session manually using pylibmc.
Upon inspecting the request session data in views.py:
def my_view(request):
if request.user.is_authenticated():
print request.session.session_key
# the above prints something like this: "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
print request.session.cache_key
# the above prints something like this: "django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
return HttpResponse(status=200)
else:
return HttpResponse(status=401)
I noticed that when printing cache_key, it prints with the default KEY_PREFIX whereas for session_key it didn't. Take a look at the comments in the code to see what I mean.
So I figured, "Ok great, one of these key names should work. Let me try grabbing the session data manually just for educational purposes":
import pylibmc
mc = pylibmc.Client(['127.0.0.1:11211'])
# Let's try key "1ay2kcv7axb3nu5fwnwoyf85wkwsttz9"
mc.get("1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
Hmm nothing happens, no key exists by that name. Ok no worries, let's try the cache_key then, that should definitely work right?
mc.get("django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9")
What? How am I still getting nothing back? As I test I decide to set and get a random key value to see if it works and it does. I run get_stats() again just to make sure that the key does exist. I also test the web app to see if indeed my session is working and it does. So this leads me to conclude that there is a different naming scheme that I'm unaware of.
If so, what is the correct naming scheme?
Yes, the cache key used internally by Django is, in general, different to the key sent to the cache backend (in this case pylibmc / memcached). Let us call these two keys the django cache key and the final cache key respectively.
The django cache key given by request.session.cache_key is for use with Django's low-level cache API, e.g.:
>>> from django.core.cache import cache
>>> cache.get(request.session.cache_key)
{'_auth_user_hash': '1ay2kcv7axb3nu5fwnwoyf85wkwsttz9', '_auth_user_id': u'1', '_auth_user_backend': u'django.contrib.auth.backends.ModelBackend'}
The final cache key on the other hand, is a composition of the key prefix, the django cache key, and the cache version number. The make_key function (from Django docs) below demonstrates how these three values are composed to generate this key:
def make_key(key, key_prefix, version):
return ':'.join([key_prefix, str(version), key])
By default, key_prefix is the empty string and version is 1.
Finally, by inspecting make_key we find that the correct final cache key to pass to mc.get is
:1:django.contrib.sessions.cache1ay2kcv7axb3nu5fwnwoyf85wkwsttz9
which has the form <KEY_PREFIX>:<VERSION>:<KEY>.
Note: the final cache key can be changed by defining KEY_FUNCTION in the cache settings.

Django Redis cache values

I have set the value to Redis server externally using python script.
r = redis.StrictRedis(host='localhost', port=6379, db=1)
r.set('foo', 'bar')
And tried to get the value from web request using django cache inside views.py.
from django.core.cache import cache
val = cache.get("foo")
It is returning None. But when I tries to get it form
from django_redis import get_redis_connection
con = get_redis_connection("default")
val = con.get("foo")
It is returning the correct value 'bar'. How cache and direct connections are working ?
Libraries usually use several internal prefixes to store keys in redis, in order not to be mistaken with user defined keys.
For example, django-redis-cache, prepends a ":1:" to every key you save into it.
So for example when you do r.set('foo', 'bar'), it sets the key to, ":1:foo". Since you don't know the prefix prepended to your key, you can't get the key using a normal get, you have to use it's own API to get.
r.set('foo', 'bar')
r.get('foo') # None
r.get(':1:foo') # bar
So in the end, it returns to the library you use, go read the code for it and see how it exactly saves the keys. redis-cli can be your valuable friend here. Basically set a key with cache.set('foo', 'bar'), and go into redis-cli and check with 'keys *' command to see what key was set for foo.

how to write a query to get find value in a json field in django

I have a json field in my database which is like
jsonfield = {'username':'chingo','reputation':'5'}
how can i write a query so that i can find if a user name exists. something like
username = 'chingo'
query = User.objects.get(jsonfield['username']=username)
I know the above query is a wrong but I wanted to know if there is a way to access it?
If you are using the django-jsonfield package, then this is simple. Say you have a model like this:
from jsonfield import JSONField
class User(models.Model):
jsonfield = JSONField()
Then to search for records with a specific username, you can just do this:
User.objects.get(jsonfield__contains={'username':username})
Since Django 1.9, you have been able to use PostgreSQL's native JSONField. This makes search JSON very simple. In your example, this query would work:
User.objects.get(jsonfield__username='chingo')
If you have an older version of Django, or you are using the Django JSONField library for compatibility with MySQL or something similar, you can still perform your query.
In the latter situation, jsonfield will be stored as a text field and mapped to a dict when brought into Django. In the database, your data will be stored like this
{"username":"chingo","reputation":"5"}
Therefore, you can simply search the text. Your query in this siutation would be:
User.objects.get(jsonfield__contains='"username":"chingo"')
2019: As #freethebees points out it's now as simple as:
User.objects.get(jsonfield__username='chingo')
But as the doc examples mention you can query deeply, and if the json is an array you can use an integer to index it:
https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
>>> Dog.objects.create(name='Rufus', data={
... 'breed': 'labrador',
... 'owner': {
... 'name': 'Bob',
... 'other_pets': [{
... 'name': 'Fishy',
... }],
... },
... })
>>> Dog.objects.create(name='Meg', data={'breed': 'collie', 'owner': None})
>>> Dog.objects.filter(data__breed='collie')
<QuerySet [<Dog: Meg>]>
>>> Dog.objects.filter(data__owner__name='Bob')
<QuerySet [<Dog: Rufus>]>
>>> Dog.objects.filter(data__owner__other_pets__0__name='Fishy')
<QuerySet [<Dog: Rufus>]>
Although this is for postgres, I believe it works the same in other DBs like MySQL
Postgres: https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#querying-jsonfield
MySQL: https://django-mysql.readthedocs.io/en/latest/model_fields/json_field.html#querying-jsonfield
This usage is somewhat anti-pattern. Also, its implementation is not going to have regular performance, and perhaps is error-prone.
Normally don't use jsonfield when you need to look up through fields. Use the way the RDBMS provides or MongoDB(which internally operates on faster BSON), as Daniel pointed out.
Due to the deterministic of JSON format,
you could achieve it by using contains (regex has issue when dealing w/ multiple '\' and even slower), I don't think it's good to use username in this way, so use name instead:
def make_cond(name, value):
from django.utils import simplejson
cond = simplejson.dumps({name:value})[1:-1] # remove '{' and '}'
return ' ' + cond # avoid '\"'
User.objects.get(jsonfield__contains=make_cond(name, value))
It works as long as
the jsonfield using the same dump utility (the simplejson here)
name and value are not too special (I don't know any egde-case so far, maybe someone could point it out)
your jsonfield data is not corrupt (unlikely though)
Actually I'm working on a editable jsonfield and thinking about whether to support such operations. The negative proof is as said above, it feels like some black-magic, well.
If you use PostgreSQL you can use raw sql to solve problem.
username = 'chingo'
SQL_QUERY = "SELECT true FROM you_table WHERE jsonfield::json->>'username' = '%s'"
User.objects.extra(where=[SQL_EXCLUDE % username]).get()
where you_table is name of table in your database.
Any methods when you work with JSON like with plain text - looking like very bad way.
So, also I think that you need a better schema of database.
Here is the way I have found out that will solve your problem:
search_filter = '"username":{0}'.format(username)
query = User.objects.get(jsonfield__contains=search_filter)
Hope this helps.
You can't do that. Use normal database fields for structured data, not JSON blobs.
If you need to search on JSON data, consider using a noSQL database like MongoDB.

Hierarchical cache in Django

What I want to do is to mark some values in the cache as related so I could delete them at once. For example when I insert a new entry to the database I want to delete everything in the cache which was based on the old values in database.
I could always use cache.clear() but it seems too brutal to me. Or I could store related values together in the dictionary and cache this dictionary. Or I could maintain some kind of index in an extra field in cache. But everything seems to complicated to me (eventually slow?).
What you think? Is there any existing solution? Or is my approach wrong? Thanks for answers.
Are you using the cache api? It sounds like it.
This post, which pointed me to these slides helped me create a nice generational caching system which let me create the hierarchy I wanted.
In short, you store a generation key (such as group) in your cache and incorporate the value stored into your key creation function so that you can invalidate a whole set of keys at once.
With this basic concept you could create highly complex hierarchies or just a simple group system.
For example:
class Cache(object):
def generate_cache_key(self, key, group=None):
"""
Generate a cache key relating them via an outside source (group)
Generates key such as 'group-1:KEY-your-key-here'
Note: consider this pseudo code and definitely incomplete code.
"""
key_fragments = [('key', key)]
if group:
key_fragments.append((group, cache.get(group, '1')))
combined_key = ":".join(['%s-%s' % (name, value) for name, value in key_fragments)
hashed_key = md5(combined_key).hexdigest()
return hashed_key
def increment_group(self, group):
"""
Invalidate an entire group
"""
cache.incr(group)
def set(self, key, value, group=None):
key = self.generate_cache_key(key, group)
cache.set(key, value)
def get(self, key, group=None):
key = self.generate_cache_key(key, group)
return cache.get(key)
# example
>>> cache = Cache()
>>> cache.set('key', 'value', 'somehow_related')
>>> cache.set('key2', 'value2', 'somehow_related')
>>> cache.increment_group('somehow_related')
>>> cache.get('key') # both invalidated
>>> cache.get('key2') # both invalidated
Caching a dict or something serialised (with JSON or the like) sounds good to me. The cache backends are key-value stores like memcache, they aren't hierarchical.