I am using redis as a broker between Django and Celery. The redis instance I have access to is shared with many other applications and so the broker is not reliable (the redis keys it uses are deleted by others, the messages often get sent to workers in other applications). Changing redis database does not solve the problem (there are few databases and many applications).
How can I configure Celery to prefix all the keys it uses with a custom string? The docs mention ways to add prefixes to queue names, but that does not affect the redis keys. The underlying library (Kombu) does not seem to let the user prefix the keys it uses as far as I can tell.
The functionality to add prefix to all the redis keys has been added as part of this. Now you can configure it like this:
BROKER_URL = 'redis://localhost:6379/0'
celery = Celery('tasks', broker=BROKER_URL, backend=BROKER_URL)
celery.conf.broker_transport_options = {'global_keyprefix': "prefix"}
This is not supported by Celery yet. A pull request on this subject is currently stalled due to a lack of workforce:
https://github.com/celery/kombu/pull/912
You can just override the prefix value of your celery task.
#shared_task(bind=True)
def task(self, params):
self.backend.task_keyprefix = b'new-prefix'
Related
I put Celery in my Django app so that the two other python programs can process the input from my Django app via doing subprocess method.
My question is how do I access the output from the subprocess? Back then when I made just a python program, I access the log files (output from the two apps) via stdout and stderr. Is this the same when I use Celery in Django? Is the value of CELERY_RESULT_BACKEND (if I should assign my Django app's db here) affected by the log files?
So far what I've done is:
Access the two apps via subprocess in my tasks.py
I assigned my broker's db, Redis, as my db for now for CELERY_RESULT_BACKEND. My plan is to get the log files and then save them to my Django app's db so that I can just access that db.
Can you offer some help?
Typically, you only care about the task result, which is the return value of the celery task, and that is stored in your result_backend for at least result_expires time (usually 1 day). So, to the extent that you want to access any particular task's result, you can just do so using the task ID.
I am newbie in python / django. Using neo4django library and an existing Neo4j database I would like to connect to it and test if the connection is successful. How can I achieve this behavior?
You don't 'connect' to a database anymore. This is the Frameworks job. You just define the parameters and start writing Models.
Those Models are your entities with fields which can be used like a variable. In other words: your models are your definition of the Database tables.
You can test against http://host:7474/db/data/ if it returns a 200, Ok.
I don't know much about neo4django, but you can test if a database is accessible with py2neo, the general purpose Python driver (http://py2neo.org/2.0/).
One simple way is to ask the server for its version:
from py2neo import Graph
from py2neo.packages.httpstream import SocketError
# adjust as necessary
graph = Graph("http://localhost:7474/db/data/")
try:
print(graph.neo4j_version)
except SocketError:
print('No connection to database.')
I am using django as a framework to build a content management system for a site with a blog.
Each blog post will have a route that contains a unique identifier for the blog post. These blog posts can be scheduled and have an expiry date. This means that the routes have to be dynamic.
The entire site needs to be cached and we have redis set up as a back end cache. We currently cache rendered pages against out static routes, but need to find a way of caching pages against the dynamic routes (and invalidating them when the blog posts expire.)
I could use a cron job but it isn't appropriate because...
a) New blog posts go live rarely and not periodically
b) Users can schedule posts to the minute. This means that a cron job would have to run every minute which seems like overkill!
I've just found the django-cacheops library, which seems to do exactly what I need (schedule the invalidation of our cache and invalidate them via signals). Is this compatible with our existing setup and how easy is the setup?
I assume this is a pretty common problem - does anyone have any better ideas than the above?
I can't comment on django-cacheops because I've never used it, but Redis provides a really easy way to do this using the EXPIRE command:
Set a timeout on key. After the timeout has expired, the key will automatically be deleted.
Usage:
SET some_key "some_value"
EXPIRE some_key 10
The key some_key will now automatically be cleaned/deleted by Redis in 10 seconds. If you need to delete blog posts' cache knowing when they should be deleted from the outset, this should serve your needs perfectly.
Cacheops invalidate cache when a post is changed, that's its primary use. But you can also expire by timeout:
from cacheops import cached_as, cached_view_as
# A queryset
post = Post.objects.cache(timeout=your_timeout).get(pk=post_pk)
# A function
#cached_as(Post.objects.filter(pk=post_pk), timeout=your_timeout)
def get_post_data(...):
...
# A view
#cached_view_as(Post, timeout=your_timeout)
def post(request, ...):
...
However, there is currently no way you can specify timeout depending on cached object.
I have a Django 1.5.1 webapp using Celery 3.0.23 with RabbitMQ 3.1.5. and sqlite3.
I can submit jobs using a simple result = status.tasks.mymethod.delay(parameter), all tasks executes correctly:
[2013-09-30 17:04:11,369: INFO/MainProcess] Got task from broker: status.tasks.prova[a22bf0b9-0d5b-4ce5-967a-750f679f40be]
[2013-09-30 17:04:11,566: INFO/MainProcess] Task status.tasks.mymethod[a22bf0b9-0d5b-4ce5-967a-750f679f40be] succeeded in 0.194540023804s: u'Done'
I want to display in a page the latest 10 jobs submitted and their status. Is there a way in Django to get such objects? I see a couple of tables in the database (celery_taskmeta and celery_taskmeta_2ff6b945) and tried some accesses to the objects but Django always displays a AttributeError page.
What is the correct way to access Celery results from Django?
Doing
cel = celery.status.tasks.get(None)
cel = status.tasks.all()
does not work, resulting in the aforementioned AttributeError. (status is the name of my app)
EDIT: I am sure tasks are saved, as this small tutorial says:
By default django-celery stores this state in the Django database. You may consider choosing an alternate result backend or disabling states alltogether (see Result Backends).
Following the links there are only references on how to setup the DB connection and not how to retrieve the results.
Try this:
from djcelery.models import TaskMeta
TaskMeta.objects.all()
I want to develop an application which uses Django as Fronted and Celery to do background stuff.
Now, sometimes Celery workers on different machines need database access to my django frontend machine (two different servers).
They need to know some realtime stuff and to run the django-app with
python manage.py celeryd
they need access to a database with all models available.
Do I have to access my MySQL database through direct connection? Thus I have to allow user "my-django-app" access not only from localhost on my frontend machine but from my other worker server ips?
Is this the "right" way, or I'm missing something? Just thought it isn't really safe (without ssl), but maybe that's just the way it has to be.
Thanks for your responses!
They will need access to the database. That access will be through a database backend, which can be one that ships with Django or one from a third party.
One thing I've done in my Django site's settings.py is load database access info from a file in /etc. This way the access setup (database host, port, username, password) can be different for each machine, and sensitive info like the password isn't in my project's repository. You might want to restrict access to the workers in a similar manner, by making them connect with a different username.
You could also pass in the database connection information, or even just a key or path to a configuration file, via environment variables, and handle it in settings.py.
For example, here's how I pull in my database configuration file:
g = {}
dbSetup = {}
execfile(os.environ['DB_CONFIG'], g, dbSetup)
if 'databases' in dbSetup:
DATABASES = dbSetup['databases']
else:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
# ...
}
}
Needless to say, you need to make sure that the file in DB_CONFIG is not accessible to any user besides the db admins and Django itself. The default case should refer Django to a developer's own test database. There may also be a better solution using the ast module instead of execfile, but I haven't researched it yet.
Another thing I do is use separate users for DB admin tasks vs. everything else. In my manage.py, I added the following preamble:
# Find a database configuration, if there is one, and set it in the environment.
adminDBConfFile = '/etc/django/db_admin.py'
dbConfFile = '/etc/django/db_regular.py'
import sys
import os
def goodFile(path):
return os.path.isfile(path) and os.access(path, os.R_OK)
if len(sys.argv) >= 2 and sys.argv[1] in ["syncdb", "dbshell", "migrate"] \
and goodFile(adminDBConfFile):
os.environ['DB_CONFIG'] = adminDBConfFile
elif goodFile(dbConfFile):
os.environ['DB_CONFIG'] = dbConfFile
Where the config in /etc/django/db_regular.py is for a user with access to only the Django database with SELECT, INSERT, UPDATE, and DELETE, and /etc/django/db_admin.py is for a user with these permissions plus CREATE, DROP, INDEX, ALTER, and LOCK TABLES. (The migrate command is from South.) This gives me some protection from Django code messing with my schema at runtime, and it limits the damage an SQL injection attack can cause (though you should still check and filter all user input).
This isn't a solution to your exact problem, but it might give you some ideas for ways to smarten up Django's database access setup for your purposes.