Celery / Django - How to programatically see workers' states - django

I just installed Celery and I want to create a simple status page that shows the current number of workers and their status.
Is this possible? From web searches the best I found was celery.current_app.control.inspect()
But as far as I can see, it doesn't mention anything about workers. (I'm using Kombu with SQS for the backend if that matters)

In the documentation of celery workers it is explained the output of the inspect commands.
By default using celery.current_app.control.inspect() returns an "inspector object" that allows you to ask for the state of all the running workers. For example if you execute this code with two running workers named 'adder' and 'sleeper':
i = celery.current_app.control.inspect()
i.registered()
the call to i.registered() could return something like:
{
'adder#example.com': ['tasks.add'],
'sleeper#example.com': ['tasks.sleeptask'],
}
In conclusion, the "inspector" methods registered, active, scheduled, etc. return a dictionary with the results classified by the workers chosen when celery.current_app.control.inspect() was called (if no workers are passed as arguments, all workers are implicitly selected).

Related

Scheduler duplicate email 8 times [duplicate]

We have a web app made with pyramid and served through gunicorn+nginx. It works with 8 worker threads/processes
We needed to jobs, we have chosen apscheduler. here is how we launch it
from apscheduler.events import EVENT_JOB_EXECUTED, EVENT_JOB_ERROR
from apscheduler.scheduler import Scheduler
rerun_monitor = Scheduler()
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
The issue is that all the worker processes of gunicorn pick the scheduler up. We tried implementing a file lock but it does not seem like a good enough solution. What would be the best way to make sure at any given time only one of the worker process picks the scheduled event up and no other thread picks it up till next JOB_INTERVAL?
The solution needs to work even with mod_wsgi in case we decide to switch to apache2+modwsgi later. It needs to work with single process development server which is waitress.
Update from the bounty sponsor
I'm facing the same issue described by the OP, just with a Django app. I'm mostly sure adding this detail won't change much if the original question. For this reason, and to gain a bit more of visibility, I also tagged this question with django.
Because Gunicorn is starting with 8 workers (in your example), this forks the app 8 times into 8 processes. These 8 processes are forked from the Master process, which monitors each of their status & has the ability to add/remove workers.
Each process gets a copy of your APScheduler object, which initially is an exact copy of your Master processes' APScheduler. This results in each "nth" worker (process) executing each job a total of "n" times.
A hack around this is to run gunicorn with the following options:
env/bin/gunicorn module_containing_app:app -b 0.0.0.0:8080 --workers 3 --preload
The --preload flag tells Gunicorn to "load the app before forking the worker processes". By doing so, each worker is "given a copy of the app, already instantiated by the Master, rather than instantiating the app itself". This means the following code only executes once in the Master process:
rerun_monitor = Scheduler()
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
Additionally, we need to set the jobstore to be anything other than :memory:.This way, although each worker is its own independent process unable of communicating with the other 7, by using a local database (rather then memory) we guarantee a single-point-of-truth for CRUD operations on the jobstore.
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
rerun_monitor = Scheduler(
jobstores={'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')})
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
Lastly, we want to use the BackgroundScheduler because of its implementation of start(). When we call start() in the BackgroundScheduler, a new thread is spun up in the background, which is responsible for scheduling/executing jobs. This is significant because remember in step (1), due to our --preload flag we only execute the start() function once, in the Master Gunicorn process. By definition, forked processes do not inherit the threads of their Parent, so each worker doesn't run the BackgroundScheduler thread.
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
rerun_monitor = BackgroundScheduler(
jobstores={'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')})
rerun_monitor.start()
rerun_monitor.add_interval_job(job_to_be_run,\
seconds=JOB_INTERVAL)
As a result of all of this, every Gunicorn worker has an APScheduler that has been tricked into a "STARTED" state, but actually isn't running because it drops the threads of it's parent! Each instance is also capable of updating the jobstore database, just not executing any jobs!
Check out flask-APScheduler for a quick way to run APScheduler in a web-server (like Gunicorn), and enable CRUD operations for each job.
I found a fix that worked with a Django project having a very similar issue. I simply bind a TCP socket the first time the scheduler starts and check against it subsequently. I think the following code can work for you as well with minor tweaks.
import sys, socket
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("127.0.0.1", 47200))
except socket.error:
print "!!!scheduler already started, DO NOTHING"
else:
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
scheduler.start()
print "scheduler started"
Short answer: You can't do it properly without consequences.
I'm using Gunicorn as an example, but it is essentially the same for uWSGI. There are various hacks when running multiple processes, to name a few:
use --preload option
use on_starting hook to start the APScheduler background scheduler
use when_ready hook to start the APScheduler background scheduler
They work to some extent but may get the following errors:
worker timing out frequently
scheduler hanging when there are no jobs https://github.com/agronholm/apscheduler/issues/305
APScheduler is designed to run in a single process where it has complete control over the process of adding jobs to job stores. It uses threading.Event's wait() and set() methods to coordinate. If they are run by different processes, the coordination wouldn't work.
It is possible to run it in Gunicorn in a single process.
use only one worker process
use the post_worker_init hook to start the scheduler, this will make sure the scheduler is run only in the worker process but not the master process
The author also pointed out sharing the job store amount multiple processes isn't possible. https://apscheduler.readthedocs.io/en/stable/faq.html#how-do-i-share-a-single-job-store-among-one-or-more-worker-processes He also provided a solution using RPyC.
While it's entirely doable to wrap APScheduler with a REST interface. You might want to consider serving it as a standalone app with one worker. In another word, if you have others endpoints, put them in another app where you can use multiple workers.

Retry celery tasks before their countdown

we are running Django v1.10 with celery v4.0.2, rabbitMQ v3.5.7 and flower v0.9.1 and are pretty new with celery, rabbitMQ and flower.
There is a function x() which was set to retry after 7 days in case of failure. We have 1000's of instances of x re-queued on production. We have fixed the issue and would like to retry the instances asap.
Is there a way to force the retry before it's scheduled time?
If you can get a list of the tasks, it's a matter of calling task.retry(exc=exc) on each one. See docs.
Try celery.task.control.inspect().reserved() and see if you can filter the tasks that way. Example here.
You get a task object from its id with this, per this answer.
result = MyTask.AsyncResult(task_id)
result.get()
So after trying a lot of things, I had to solve it by getting the list of scheduled tasks and their arguments and calling the functions in a for loop with the arguments.
There is apparently no way to retry a task manually by design. You have to create another task with the same parameters. This is what I finally did:
i = inspect()
scheduled = i.scheduled()
for key in scheduled:
for element in scheduled[key]:
reqDict = element['request']
if reqDict['type']=='module.function':
module.function.delay(converted_arguments)
revoke(reqDict['id'], terminate=True)

Multiple celery server but same redis broker executing task twice

When I inspect celery -A proj inspect active_queues I see two servers showing their queues they are listening to and they are pointing to same default queue name celery. Still the task issued by django app gets executed twice by both servers(Once by each celery server - so two times).
I can see the transport type is also direct - the default one.
On my local task gets executed once so I am sure that the task is called only once by my django app.
What can I be missing here?
Ok, i looked up the docs, i think you need to set celerybeat-scheduler in your settings.py which makes sure tasks are being scheduled by a single scheduler.
http://celery.readthedocs.org/en/latest/configuration.html#celerybeat-scheduler
On Redis you can set the current database for the application you are running, setting the database will separate the information to use different apps.
If you are using Django the configuration is
CELERY_BROKER_VHOST = {number of the database}
If you are not using Django i beleive the configuration is CELERY_REDIS_DB or redis_db depending on your celery version
For instance for your first application could be CELERY_BROKER_VHOST = 1
For the second application could be CELERY_BROKER_VHOST = 2
and for your local development could be CELERY_BROKER_VHOST = 99
http://docs.celeryproject.org/en/latest/userguide/configuration.html#id8

Django-celery project, how to handle results from result-backend?

1) I am currently working on a web application that exposes a REST api and uses Django and Celery to handle request and solve them. For a request in order to get solved, there have to be submitted a set of celery tasks to an amqp queue, so that they get executed on workers (situated on other machines). Each task is very CPU intensive and takes very long (hours) to finish.
I have configured Celery to use also amqp as results-backend, and I am using RabbitMQ as Celery's broker.
Each task returns a result that needs to be stored afterwards in a DB, but not by the workers directly. Only the "central node" - the machine running django-celery and publishing tasks in the RabbitMQ queue - has access to this storage DB, so the results from the workers have to return somehow on this machine.
The question is how can I process the results of the tasks execution afterwards? So after a worker finishes, the result from it gets stored in the configured results-backend (amqp), but now I don't know what would be the best way to get the results from there and process them.
All I could find in the documentation is that you can either check on the results's status from time to time with:
result.state
which means that basically I need a dedicated piece of code that runs periodically this command, and therefore keeps busy a whole thread/process only with this, or to block everything with:
result.get()
until a task finishes, which is not what I wish.
The only solution I can think of is to have on the "central node" an extra thread that runs periodically a function that basically checks on the async_results returned by each task at its submission, and to take action if the task has a finished status.
Does anyone have any other suggestion?
Also, since the backend-results' processing takes place on the "central node", what I aim is to minimize the impact of this operation on this machine.
What would be the best way to do that?
2) How do people usually solve the problem of dealing with the results returned from the workers and put in the backend-results? (assuming that a backend-results has been configured)
I'm not sure if I fully understand your question, but take into account each task has a task id. If tasks are being sent by users you can store the ids and then check for the results using json as follows:
#urls.py
from djcelery.views import is_task_successful
urlpatterns += patterns('',
url(r'(?P<task_id>[\w\d\-\.]+)/done/?$', is_task_successful,
name='celery-is_task_successful'),
)
Other related concept is that of signals each finished task emits a signal. A finnished task will emit a task_success signal. More can be found on real time proc.

django celery rabbitmq execute delay

I use Django-Celery +rabbitmq to execute some asyn tasks,I define a queue 'sendmail' to execute send email task,send mail is triggered by a specific task(this task has own queue), but now I encounter a problem,after the specific task finish, the mail sometimes send at once, sometimes need 5-20minutes.I want to know what reason caused it.
Django-celery will package the taskname and param as message to rabbitmq when call task.delay().
I want to know when the message go to the rabbitmq, but use web management tool only can see total messages,can't see the every message's detail, especially the time the message reached. Django-celery log can only see the work got from broker time and execute task time.I want to know all related timepoint to sure which step the time main consumed.
Django-Celery does (I believe) report task data on a per-task basis. When you sync your database, it crates a bunch of monitoring tables which are accessible via the admin. However, in order for these tasks to be recorded in these tables, you need to run the celerycam program in the django context (python ./manage.py celerycam). The celerycam program will take "snapshots" of your tasks every second or so (by default) and record information about them. Another useful tool for monitoring is the celerymon program (which also has to run in the django context). This is a command line ncurses program that reports real-time information about tasks as they occur. Finally, rabbitmqctrl has a bunch of options that might help with monitoring.
This is a particularly useful page in the docs:
http://celery.github.com/celery/userguide/monitoring.html
Anyway, this is what I use to monitor my tasks when using celery.