Django - Johnny Cache for multiple processes - django

I've configured johnny cache with one of my applications that is hosted on apache. It is configured with memcached as the backend which runs on the same machine on the default port.
The caching works fine when multiple web clients go through apache. They all read from the cache and any update is invalidating the cache. But when a python program/script reads from the DB using django (same settings.py that has johnny configuration), it doesn't read from the cache and hence any updates made by that program wont affect the cache. Which leaves me with the web clients reading stale data from the cache.
I haven't found anything in johnny cache's documentation related to this. Any thoughts on this situation?
I'm using johnny cache 0.3.3, django 1.2.5 and python 2.7.
Edit:
to answer one of the quetions in the comments, I read from the DB in the script this way-
>>> cmp = MyModelClass.objects.get(id=1)
>>> cmp.cust_field_2
u'aaaa'
I know it doesn't read from the cache because I update the table directly by firing an update sql statement and the updated value is not reflected in my web client as it still reads from the cache. Whereas my script shows the updated value when I re-fetch the object using MyModelClass.objects.get(id=1)
Thanks,

It appears that middleware is not called when you run scripts/management comands which is why you are seeing the difference. This makes sense when reading the documentation on middleware because it processes things like request and views, which don't exist in a custom script.
I found a way around this, and there is an issue regarding it in the Johnny Cache bitbucket repo. In your script put the following before you do anything with the database:
from johnny.middleware import QueryCacheMiddleware
qcm = QueryCacheMiddleware()
# put the code for you script here
qcm.unpatch()
You can see more on that here:
https://bitbucket.org/jmoiron/johnny-cache/issue/49/offline-caching
and here:
https://bitbucket.org/jmoiron/johnny-cache/issue/50/johhny-cache-not-active-in-management

That is the recommended way from the documentation:
from johnny.cache import enable
enable()
Update:
What I observed, as if your tasks.py files have this in the beginning, you can not disable johnny cache using settings.py anymore.
I have reported the issue: https://github.com/jmoiron/johnny-cache/issues/27

Related

High response time when setting value for Django settings module inside a middleware

In a Django project of mine, I've written a middleware that performs an operation for every app user.
I've noticed that the response time balloons up if I write the following at the start of the middleware module:
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE","myproject.settings")
It's about 10 times less if I omit these lines. Being a beginner, I'm trying to clarify why there's such a large differential between the respective response times. Can an expert explain it? Have you seen something like it before?
p.s. I already know why I shouldn't modify the environment variable for Django settings inside a middleware, so don't worry about that.
The reason will likely have to do something with django reloading your settings configuration for every request rather than once per server thread/process (and thus, also, re-instantiating/connecting to your database, cache, etc.). You will want to confirm this with profiling. this behavior is also very likely dependent on which app server you are running.
If you really want this level of control for your settings, it is much easier for you to add this line to manage.py, wsgi.py or whatever file/script you use to launch your app server.
P.S. If you already know you shouldn’t do it, why are you doing it?

Django-skel slow due to httplib requests to S3

G'day,
I am playing around with django-skel on a recent project and have used most of its defaults: Heroku for hosting and S3 for file storage. I'm mostly serving a static-y site except using sorl for thumbnail generation, however the response times are pathetic.
You can visit the site: http://bit.ly/XlzkXp
My template looks like: https://gist.github.com/cd15e320be6f4454a7fb
I'm serving the template using a shortcut from the URL conf, no database usage at all: https://gist.github.com/f9d1a9a191959dcff1b5
However, it's consistently taking 15+ seconds for the response. New relic shows this is because of requests going to S3 while processing the view. This does not make any sense to me.
New Relic data: http://i.imgur.com/vs9ZTLP.png?1
Why is something using httplib to request things from S3? I can see how collectstatic might be doing it, but not the processing of the view itself.
What am I not understanding about Django-skel and this setup?
Have the same issue, my guess is that:
django-compress with django-storage are both in use
which results the former saving cache it needs to render templates to S3 bucket
and then reading it (through network, so httplib) while rendering each template
My second guess was that instructions on django-compress with remote storage to implement "S3 Storage backend which caches files locally, too" would resolve this issue.
Though it makes sense to me: saving cache to both locations local and S3 and reading from local filesystem first should speed things up, it somehow does not work this way.. still the response time is around 8+ sec.
By disabling django-compress with COMPRESS_ENABLED = False i managed to get 1-1.3 sec average response time.
Any ideas?
(I will update this answer in case of any progress)

Where does Django-celery/RabbitMQ store task results?

My celery database backend settings are:
CELERY_RESULT_BACKEND = "database"
CELERY_RESULT_DBURI = "mysqlite.db"
I am using RabbitMQ as my messager.
It doesn't seem like any results are getting stored in the db, and yet I can read the results after the task is complete. Are they in memory or a RabbitMQ cache?
I haven't tried reading the same result multiple times, so maybe its a read once then poof!
CELERY_RESULT_DBURI is for the sqlalchemy result backend, not the Django one.
The Django one always uses the default database configured in the DATABASES setting (or the DATABASE_* settings if on older Django versions)
my celery daemons work just fine, but I'm having difficulties with collecting task results. task_result.get() leads a timeout. and task.state is always PENDING..(but jobs are completed) i tried separate sqlite dbs, a single postgres db shared by workers. but i still cant get results. CELERY_RESULT_DBURI seems useless to me (for celery 2.5 )i think it's a newer configuration. Any suggestions are welcomed...
EDIT: it's all my fault:
i give extra parameters to my tasks with decorators, ignore_results=True Parameter create this problem. I deleted this key and it works like a charm :)

What is the best deployment configuration for Django?

I will be deploying my django project on the server. For that purpose I plan on doing the following optimization.
What i would like to know is that am I missing something?
How can I do it in a better manner?
Front-end:
Django-static (For compressing static media)
Running jquery from CDN
Cache control for headers
Indexing the Django db (For certain models)
Server side:
uswgi and nginx .
Memcached (For certain queries)
Putting the media and database on separate servers
These are some optimization I use on a regular basis:
frontend:
Use a js loading library like labjs, requirejs or yepnope. You should still compress/merge your js files, but in most use cases it seems to be better to make several requests to several js files and run them in parallel as to have 1 huge js file to run on each page. I always split them up in groups that make sense to balance requests and parellel loading. Some also allow for conditional loading and failovers (i.e. if for some reason, your cdn'd jquery is not there anymore)
Use sprites where posible.
Backend:
configure django-compressor (django-static is fine)
Enable gzip compression in nginx.
If you are using postgresql (which is the recommended sql database), use something like pgbouncer or pgpool2.
Use and configure cache (I use redis)
(already mentioned - use celery for everything that might take longer)
Small database work: use indexes where it's needed, look out for making too many queries (common when not using select_related where you are supposed to) or slow queries (enable log slow queries in your db). Always use select_related with arguments.
If implementing search, I always use a standalone search engine. (elasticsearch/solr)
Now comes profiling the app and looking for code specific improvements. Some things to keep an eye on.
An option may be installing Celery if you need to support asynchronous & period tasks. If you do so, consider installing Redis instead of Memcached. Using Redis you can manage sessions and carry out Celery operations as well as do caching.
Take a look at here: http://unfoldthat.com/2011/09/14/try-redis-instead.html

which django cache backend to use?

I been using cmemcache + memcache for a while with positive results.
However cmemcache lately not tagging along well and I also found that it's now recommended. I have now installed python-memcached and its working well. As I have decided to change would like to try some other cache back end any recommendation.
I have also came across pylibmc and python-libmemcached any other??
Have anyone tried nginx memcache module?
Thanks
Only cmemcache and python-memcached are supported by the Django memcached backend. I don't have experience with either of the two libraries you mentioned, but if you want to use them you will need to write a new cache backend.
The nginx memcached module is something completely different. It doesn't really seem like it would work well with Django's caching, but it might with enough hacking. At any rate, I wouldn't use it, since it is much better if you control what gets cached and retrieved from cache from the Python side.