I use Django-Celery +rabbitmq to execute some asyn tasks,I define a queue 'sendmail' to execute send email task,send mail is triggered by a specific task(this task has own queue), but now I encounter a problem,after the specific task finish, the mail sometimes send at once, sometimes need 5-20minutes.I want to know what reason caused it.
Django-celery will package the taskname and param as message to rabbitmq when call task.delay().
I want to know when the message go to the rabbitmq, but use web management tool only can see total messages,can't see the every message's detail, especially the time the message reached. Django-celery log can only see the work got from broker time and execute task time.I want to know all related timepoint to sure which step the time main consumed.
Django-Celery does (I believe) report task data on a per-task basis. When you sync your database, it crates a bunch of monitoring tables which are accessible via the admin. However, in order for these tasks to be recorded in these tables, you need to run the celerycam program in the django context (python ./manage.py celerycam). The celerycam program will take "snapshots" of your tasks every second or so (by default) and record information about them. Another useful tool for monitoring is the celerymon program (which also has to run in the django context). This is a command line ncurses program that reports real-time information about tasks as they occur. Finally, rabbitmqctrl has a bunch of options that might help with monitoring.
This is a particularly useful page in the docs:
http://celery.github.com/celery/userguide/monitoring.html
Anyway, this is what I use to monitor my tasks when using celery.
Related
In my Django project I'm using Celery with a RabbitMQ broker for asynchronous tasks, how can I record the information of all of my tasks (e.g. created time (task appears in queue), worker consume task time, execution time, status, ...) to monitor how Celery is doing?
I know there are solutions like Flower but that seems to much for what I need, django-celery-results looks like what I want but it's missing a few information I need like task created time.
Thanks!
It seems like you often find the answer yourself after asking on SO. I settled with using celery signals to do all the recording I want and store the results in a database table.
1) I am currently working on a web application that exposes a REST api and uses Django and Celery to handle request and solve them. For a request in order to get solved, there have to be submitted a set of celery tasks to an amqp queue, so that they get executed on workers (situated on other machines). Each task is very CPU intensive and takes very long (hours) to finish.
I have configured Celery to use also amqp as results-backend, and I am using RabbitMQ as Celery's broker.
Each task returns a result that needs to be stored afterwards in a DB, but not by the workers directly. Only the "central node" - the machine running django-celery and publishing tasks in the RabbitMQ queue - has access to this storage DB, so the results from the workers have to return somehow on this machine.
The question is how can I process the results of the tasks execution afterwards? So after a worker finishes, the result from it gets stored in the configured results-backend (amqp), but now I don't know what would be the best way to get the results from there and process them.
All I could find in the documentation is that you can either check on the results's status from time to time with:
result.state
which means that basically I need a dedicated piece of code that runs periodically this command, and therefore keeps busy a whole thread/process only with this, or to block everything with:
result.get()
until a task finishes, which is not what I wish.
The only solution I can think of is to have on the "central node" an extra thread that runs periodically a function that basically checks on the async_results returned by each task at its submission, and to take action if the task has a finished status.
Does anyone have any other suggestion?
Also, since the backend-results' processing takes place on the "central node", what I aim is to minimize the impact of this operation on this machine.
What would be the best way to do that?
2) How do people usually solve the problem of dealing with the results returned from the workers and put in the backend-results? (assuming that a backend-results has been configured)
I'm not sure if I fully understand your question, but take into account each task has a task id. If tasks are being sent by users you can store the ids and then check for the results using json as follows:
#urls.py
from djcelery.views import is_task_successful
urlpatterns += patterns('',
url(r'(?P<task_id>[\w\d\-\.]+)/done/?$', is_task_successful,
name='celery-is_task_successful'),
)
Other related concept is that of signals each finished task emits a signal. A finnished task will emit a task_success signal. More can be found on real time proc.
So I have this 2 applications connected with a REST API (json messages). One written in Django and the other in Php. I have an exact database replica on both sides (using mysql).
When i press "submit" on one of them, i want that data to be saved on the current app database, and start a cron job with celery/redis to update the remote database for the other app using rest.
My question is, how do i attribute the same worker to my tasks in order to keep a FIFO order?
I need my data to be consistent and FIFO is really important.
Ok i am going to detail what i want to do a little further:
So i have this django app, and when i press submit after i fill in the form my celery worker wakes up and takes care of taking that submitted data and posting to a remote server. This i can do without problems.
Now, imagine that my internet goes down at that exact time, my celery worker keeps retrying to send until it is successful But imagine i do another submit before my previous data is submitted, my data wont be consistent on the other remote server.
Now that is my problem. I am not able to make this requests FIFO with the retry option given by celery so i that's were i need some help figuring that out.
this is the answer i got from another forum:
Use named queues with celery:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues
Start a worker process with a single worker:
http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html#starting-the-worker-process
Set this worker to consume from the appropriate queue:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues-adding-consumers
For the fifo part i can sort my celery broker in a fifo order before sending my requests
I have a custom django-command that reads and RSS, looks for new feeds and, if any new feed is found, I push it (pusher.com) to my webapp hosted in Heroku (heroku.com). This checking needs to be done as much as possible to be able to get the new feeds as soon as possible, let's say, every second.
The two issues I have are:
As this app will only be used by a few people(2-3), the command must be run only if any of these people are inside the app so I don't overload server jobs.
Once the user left the app (may be they just closed it, or they have certain time of inactivity, i.e. not clicking anything), the command must stop checking RSS.
My questions are,
where should I run the command from? directly from a view, from a signal?
How could I interrupt such command once the user leaves the app?
Thanks in advance for any help :)
You could use request-finished signal. In signal handler you could run celery task, so user hasn't wait the rss server request end
One of the characteristics I love most about Google's Task Queue is its simplicity. More specifically, I love that it takes a URL and some parameters and then posts to that URL when the task queue is ready to execute the task.
This structure means that the tasks are always executing the most current version of the code. Conversely, my gearman workers all run code within my django project -- so when I push a new version live, I have to kill off the old worker and run a new one so that it uses the current version of the code.
My goal is to have the task queue be independent from the code base so that I can push a new live version without restarting any workers. So, I got to thinking: why not make tasks executable by url just like the google app engine task queue?
The process would work like this:
User request comes in and triggers a few tasks that shouldn't be blocking.
Each task has a unique URL, so I enqueue a gearman task to POST to the specified URL.
The gearman server finds a worker, passes the url and post data to a worker
The worker simply posts to the url with the data, thus executing the task.
Assume the following:
Each request from a gearman worker is signed somehow so that we know it's coming from a gearman server and not a malicious request.
Tasks are limited to run in less than 10 seconds (There would be no long tasks that could timeout)
What are the potential pitfalls of such an approach? Here's one that worries me:
The server can potentially get hammered with many requests all at once that are triggered by a previous request. So one user request might entail 10 concurrent http requests. I suppose I could have a single worker with a sleep before every request to rate-limit.
Any thoughts?
As a user of both Django and Google AppEngine, I can certainly appreciate what you're getting at. At work I'm currently working on the exact same scenario using some pretty cool open source tools.
Take a look at Celery. It's a distributed task queue built with Python that exposes three concepts - a queue, a set of workers, and a result store. It's pluggable with different tools for each part.
The queue should be battle-hardened, and fast. Check out RabbitMQ for a great queue implementation in Erlang, using the AMQP protocol.
The workers ultimately can be Python functions. You can trigger workers using either queue messages, or perhaps more pertinent to what you're describing - using webhooks
Check out the Celery webhook documentation. Using all these tools you can build a production ready distributed task queue that implements your requirements above.
I should also mention that in regards to your first pitfall, celery implements rate-limiting of tasks using a Token Bucket algorithm.