django memcached setting location list has tuples? - django

Just saw this configuration in one of the project setting.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'KEY_PREFIX' : 'projectabc:',
'LOCATION': [
('10.1.1.1:11211', 1),
('10.1.1.2:11211', 1),
('10.1.1.3:11211', 1),
('10.1.1.4:11211', 1),
],
}
}
Just curious why have tuples inside LOCATION? what's the "1" in the tuple for?

Here in python-memcached, location ultimately gets sent to this function. It seems its a redundant (but helpful reminder) that a weight param exists.
def set_servers(self, servers):
"""
Set the pool of servers used by this client.
#param servers: an array of servers.
Servers can be passed in two forms:
1. Strings of the form C{"host:port"}, which implies a default weight of 1.
2. Tuples of the form C{("host:port", weight)}, where C{weight} is
an integer weight value.
"""

Related

Django unable to persisted cached array

Django cannot persist my cached data even if I set timeout to none.
My settings.py contains this:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'django_cache', #'/var/tmp/django_cache',
'TIMEOUT': None,
}
}
I save data with this line of code:
cache.set('array', array, timeout=None)
I fetch data like this:
array = cache.get('array')
try:
iterator = iter(array)
except TypeError:
# Array not iterable: my app gets here when cache data is lost
else:
# I go through the array and get needed info
I don't think it's a MAX_ENTRIES issue because I only have one array with 39 elements.
When the data is lost, the array becomes not iterable (because empty).
I also tried using file cache cause I suspected that restarting the Django app may clear the RAM cache but had the same issue.
I've tried file caching with the following configuration in settings.py:
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
'LOCATION': os.path.join(BASE_DIR, 'filecache'),
'TIMEOUT': None,
}
}
The "filecache" directory is created and contains a non-empty cache file but I still loose my cached array somehow.

Very slow to write to Django cache

I used to cache a database query in a global variable to speed up my application. Since this is strongly unadvised (and it did generate problems), I want to use any kind of Django cache instead. I tried LocMemCache and DatabaseCache, but both take... about 15 seconds to set my variable (twice longer than it take to generate the data, which is 7MB in size).
Is that expected ? Am I doing something wrong ?
(Memcached is limited to 1MB, and I cannot split my data, which consists in arbitrarily big binary masks).
Edit: FileBasedCache takes 30s to set as well.
Settings.py:
CACHES = {
'default': {...},
'stats': {
'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
# or 'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'stats',
},
}
Service.py:
from django.core.cache import caches
def stats_service():
stats_cache = caches['stats']
if stats_cache.get('key') is None:
stats_cache.set('key', data) # 15s with DatabaseCache, 30s with LocMemCache
return stats_cache.get('key')
Global variable (super fast) version:
_cache = {}
def stats_service():
if _cache.get('key') is None:
_cache['key'] = data
return _cache['key']
One option may be to use diskcache.DjangoCache. DiskCache extends the Django cache API to support writing and reading binary streams as-is (avoid pickling). It works particularly well for large values (like those greater than 1MB). DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django.
In your case, you could use ndarray tostring and numpy fromstring methods to quickly convert to/from a Python string. Then wrap the string with io.StringIO to store/retrieve in the cache. For example:
from django.core.cache import cache
value = cache.get('cache-key', read=True)
if value:
data = numpy.fromstring(value.read())
value.close()
else:
data = ... # Generate 7MB array.
cachge.set('cache-key', io.StringIO(data.tostring()), read=True)
DiskCache extends the Django cache API by permitting file-like values which are stored as binary blobs on disk. The Django cache benchmarks page has a discussion and comparison of alternative cache backends.
This snippet actually works fine: https://djangosnippets.org/snippets/2396/
As I understood, the only problem with using global variables for caching is thread safety, and this no-pickle version is thread-safe.

django-cache-machine and Redis

I'm trying to use django-cache-machine to cache queries within my application, but I want to use Redis as a backend. The docs don't really explain how to do this, yet the repository is filled with Redis references, so I'm pretty sure it's possible. I want to make sure I do it right though, so I'm wondering if anyone has any experience with configuring this and maybe more importantly, knows if there are any caveats?
In your settings set:
CACHE_MACHINE_USE_REDIS = True
REDIS_BACKEND = redis://127.0.0.1:6379?socket_timeout=0.1
https://github.com/jbalogh/django-cache-machine/blob/master/caching/invalidation.py#L187
https://github.com/jbalogh/django-cache-machine/blob/master/caching/invalidation.py#L213
I have a little experience in my project, a report system that generate tables from about 50 million records.
The database is Mysql and I could show my settings and models FYI.
settings:
# cache machine
CACHES = {
'default': {
'BACKEND': 'caching.backends.memcached.MemcachedCache',
'LOCATION': [
'127.0.0.1:11211',
],
'PREFIX': 'report:',
},
}
CACHE_COUNT_TIMEOUT = 60 * 24 # one day
CACHE_EMPTY_QUERYSETS = True
models:
class App(**CachingMixin**, models.Model):
**objects = CachingManager()**
name = models.CharField(max_length=64,
default='')
Note that cache-machine works fine for query_set.filter and count, not good for query_set.annotate or aggregate. Of course do not forget launch your memcache client first.
And when running you can see cache-machine logs in your django*.log to tell you hit or miss cache.

Why does Django give me different results for the same query?

For a mock web service I wrote a little Django app, that serves as a web API, which my android application queries. When I make requests tp the API, I am also able to hand over an offset and limit to only have the really necessary data transmitted. Anyway, I ran into the problem, that Django gives me different results for the same query to the API. It seems as if the results are returned round robin.
This is the Django code that will be run:
def getMetaForCategory(request, offset, limit):
if request.method == "GET":
result = { "meta_information": [] }
categoryIDs = request.GET.getlist("category_ids[]")
categorySet = set(toInt(categoryIDs))
categories = Category.objects.filter(id__in = categoryIDs)
metaSet = set([])
for category in categories:
metaSet = metaSet | set(category.meta_information.all())
metaList = list(metaSet)
metaList.sort()
for meta in metaList[int(offset):int(limit)]:
relatedCategoryIDs = getIDs(meta.category_set.all())
item = {
"_id": meta.id,
"name": meta.name,
"type": meta.type,
"categories": list(categorySet & set(relatedCategoryIDs))
}
result['meta_information'].append(item)
return HttpResponse(content = simplejson.dumps(result), mimetype = "application/json")
else:
return HttpResponse(status = 403)
What happens is the following: If all MetaInformation objects would be Foo, Bar, Baz and Blib and I would set the limit to 0:2, then I would get [Foo, Bar] with the first request and with the exact same request the method would return [Baz, Blib] when I run it for the second time.
Does anyone see what I am doing wrong here? Or is it the Django cache that somehow gets into my way?
I think the difficulty is that you are using a set to store your objects, and slicing that - and sets have no ordering (they are like dictionaries in that way). So, the results from your query are in fact indeterminate.
There are various implementations of ordered sets around - you could look into using one of them. However, I must say that I think you are doing a lot of unnecessary and expensive unique-ifying and sorting in Python, when most of this could be done directly by the database. For instance, you seem to be trying to get the unique list of Metas that are related to the categories you pass. Well, this could be done in a single ORM query:
meta_list = MetaInformation.objects.filter(category__id__in=categoryIDs)
and you could then drop the set, looping and sorting commands.

Do we need cache for an array?

Since we're developing a web-based project using django. we cache the db operation to make a better performance. But I'm wondering whether we need cache the array.
the code sample like this:
ABigArray = {
"1" : {
"name" : "xx",
"gender" "xxx",
...
},
"2" : {
...
},
...
}
class Items:
def __init__(self):
self.data = ABigArray
def get_item_by_id(self, id):
item = cache.get("item" + str(id)) # get the cached item if possible
if item:
return item
else:
item = self.data.get(str(id))
cache.set("item" + str(id), item)
return item
So I'm wondering whether we really need such cache, since IMO the array( ABigArray ) will be loaded in memory when trying to get one item. So we don't need use cache in such condition, right? Or I'm wrong?
Please correct me if I'm wrong.
Thanks.
You've cut out a bit too much information, but it looks like the "array" (actually a dictionary) is always the same - there's a single instance that is created when the module is first imported, and will be used by every Items object. So there's absolutely nothing to be gained by caching it - in fact you will lose by doing so, as you will introduce an unnecessary round trip to get the data from the cache.