I'm having a lot of problem executing certain tasks with celery beat. Some tasks like the one below get triggered by beat but the message is never received by rabbitmq.
In my django settings file I have the following perdiodic task
CELERYBEAT_SCHEDULE = {
...
'update_locations': {
'task': 'cron.tasks.update_locations',
'schedule': crontab(hour='10', minute='0')
},
...
}
at 10 UTC beat executes the task as expected
[2015-05-13 10:00:00,046: DEBUG/MainProcess] cron.tasks.update_locations sent. id->a1c53d0e-96ca-4673-9d03-972888581176
but this message is never arrives to rabbitmq (I'm using the tracing module in rabbitmq to track incoming messages). I have several other tasks which seem to run fine but certain tasks like the one above never run. Running the tasks manually in django with cron.tasks.update_locations.delay() runs the task with no problem. Note my Rabbitmq is on a different server than beat.
Is there anything I can do to ensure the message was actually sent and/or received by rabbitmq? Is there a better or other way to schedule these tasks to ensure they run?
A bit hard to answer from these minimal descriptions.
why is this in the Django settings file? I would have expected the Celery config settings to have their own config object.
Look at http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.config_from_object
Related
This is a generic question that I seek answer to because of a celery task I saw in my company's codebase from a previous employee.
It's a shared task that calls an endpoint like
#shared_task(time_limit=60*60)
def celery_task_here(some_args):
data = get_data(user, url, server_name)
# some other logic to build csv and stuff
def get_data(user, url, server_name):
client = APIClient()
client.force_authenticate(user=user)
response = client.get(some_url, format='json', SERVER_NAME=server_name)
and all the logic resides in that endpoint.
Now what I understand is that this will make the server do all the work and do not utilize celery's advantage, but I do see celery log producing queries when I run this locally. I'd like to know who's actually doing the job in this case, celery or the django server?
If the task is called via celery_task_here.delay, the task will be pushed to a queue, then the worker process that is responsible for handling the queue will actually execute the task, which is not the "Django server". The worker process could potentially be on the same machine as your Django instance, it depends on your environment.
If you were to call the task via celery_task_here.s (or as a normal function) the task would be executed by the Django server.
It depends of how the task is called
If it is meant to be called as celery task with apply_async or delay than it is executed as celery task by celery worker process
You still can call it as normal function without sending it to celery if you just call it as function
I have a code base with several apps each with tasks.py, and have a total of 100 of these functions
#periodic_task(run_every=crontab(minute='20'))
def sync_calendar_availability_and_prices(listing_id_list=None, reapply_rules_after_sync=False):
Its in the old format of celery periodic task definition but works fine on celery==4.1.
These get executed every so many hours or mins via beat and also I call them ad-hoc in the codebase by using .delay(). I want all the .delay() calls to go into a certain celery queue manual_call_queue and periodic beat fired calls for same function to go to periodic_beat_fired_queue -- is this an easy 1-2 line config change somewhere at a global level to do this?
I use rabbitmq, celery, django and django-celery-beat
To send periodic tasks to a specific queue, send queue/options arg.
#periodic_task(run_every=crontab(minute='20'), queue='manual_call_queue', options={'queue': 'periodic_beat_fired_queue'})
def sync_calendar_availability_and_prices(listing_id_list=None, reapply_rules_after_sync=False):
queue='manual_call_queue' is used when task is invoked with .delay or .apply_async
options={'queue': 'periodic_beat_fired_queue'} is used when celery beat invokes task.
We recently experienced a nasty situation with the celery framework. There were a lot of messages in the queue, however those messages weren't processed. We restarted celery and the messages started being processed again. However we do not want a situation like this happening again and are looking for a permanent solution.
It appears that celery's workers have gone stale. The documentation of celery notes the following on stale workers:
This shows that there’s 2891 messages waiting to be processed in the task queue, and there are two consumers processing them.
One reason that the queue is never emptied could be that you have a stale worker process taking the messages hostage. This could happen if the worker wasn’t properly shut down.
When a message is received by a worker the broker waits for it to be acknowledged before marking the message as processed. The broker will not re-send that message to another consumer until the consumer is shut down properly.
If you hit this problem you have to kill all workers manually and restart them
See documentation
However this relies on manual checking for stale workers, leaving lots of room for error and costing manual labor. What would be a good solution to keep celery working?
You could use supervisor or supervisor-like tools to deploy the workers, refer to Running the worker as daemon .
Moreover, you could monitor the queue status with rabbitmq-management, to check if the queue become too large, assume that you are using RabbitMQ; celery monitoring also provide some mechanisms for monitoring
I have scheduled a task using django celery (djcelery) by defining it as so:
#periodic_task(run_every=timedelta(minutes=1))
def mytask():
# Do something
I decided to remove this task from the codebase.
However, even after restarting the celery server, this task continues to be scheduled every 1 minute, although it reports an error message since this task no longer exists. Do I have to do something to clear old periodic tasks from the djcelery database, in addition to restarting the server?
You might need to remove your task from the table djcelery_periodictask as well.
1) I am currently working on a web application that exposes a REST api and uses Django and Celery to handle request and solve them. For a request in order to get solved, there have to be submitted a set of celery tasks to an amqp queue, so that they get executed on workers (situated on other machines). Each task is very CPU intensive and takes very long (hours) to finish.
I have configured Celery to use also amqp as results-backend, and I am using RabbitMQ as Celery's broker.
Each task returns a result that needs to be stored afterwards in a DB, but not by the workers directly. Only the "central node" - the machine running django-celery and publishing tasks in the RabbitMQ queue - has access to this storage DB, so the results from the workers have to return somehow on this machine.
The question is how can I process the results of the tasks execution afterwards? So after a worker finishes, the result from it gets stored in the configured results-backend (amqp), but now I don't know what would be the best way to get the results from there and process them.
All I could find in the documentation is that you can either check on the results's status from time to time with:
result.state
which means that basically I need a dedicated piece of code that runs periodically this command, and therefore keeps busy a whole thread/process only with this, or to block everything with:
result.get()
until a task finishes, which is not what I wish.
The only solution I can think of is to have on the "central node" an extra thread that runs periodically a function that basically checks on the async_results returned by each task at its submission, and to take action if the task has a finished status.
Does anyone have any other suggestion?
Also, since the backend-results' processing takes place on the "central node", what I aim is to minimize the impact of this operation on this machine.
What would be the best way to do that?
2) How do people usually solve the problem of dealing with the results returned from the workers and put in the backend-results? (assuming that a backend-results has been configured)
I'm not sure if I fully understand your question, but take into account each task has a task id. If tasks are being sent by users you can store the ids and then check for the results using json as follows:
#urls.py
from djcelery.views import is_task_successful
urlpatterns += patterns('',
url(r'(?P<task_id>[\w\d\-\.]+)/done/?$', is_task_successful,
name='celery-is_task_successful'),
)
Other related concept is that of signals each finished task emits a signal. A finnished task will emit a task_success signal. More can be found on real time proc.