Scheduled Celery Task Lost From Redis - django

I'm using Celery in Django with Redis as the Broker.
Tasks are being scheduled for the future using the eta argument in apply_async.
After scheduling the task, I can run celery -A MyApp inspect scheduled and I see the task with the proper eta for the future (24 hours in the future).
Before the scheduled time, if I restart Redis (with service redis restart) or the server reboots, running celery -A MyApp inspect scheduled again shows "- empty -".
All scheduled tasks are lost after Redis restarts.
Redis is setup with AOF, so it shouldn't be losing DB state after restarting.
EDIT
After some more research, I found out that running redis-cli -n 0 hgetall unacked both before and after the redis restart shows the tasked in the queue. So redis still has knowledge of the task, but for some reason when redis restarts, the task is removed from the worker? And then never sent again and it just stays indefinitely in the unakced queue.

Related

Celery and RabbitMQ timeouts and connection resets

I'm using RabbitMQ 3.6.0 and Celery 3.1.20 on a Windows 10 machine in a Django application. Everything is running on the same computer. I've configured Celery to Acknowledge Late (CELERY_ACKS_LATE=True) and now I'm getting connection problems.
I start the Celery worker, and after 50-60 seconds of handling tasks each worker thread fails with the following message:
Couldn't ack ###, reason:ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)
(### is the number of the task)
When I look at the RabbitMQ logs I see this:
=INFO REPORT==== 10-Feb-2016::22:16:16 ===
accepting AMQP connection <0.247.0> (127.0.0.1:55372 -> 127.0.0.1:5672)
=INFO REPORT==== 10-Feb-2016::22:16:16 ===
accepting AMQP connection <0.254.0> (127.0.0.1:55373 -> 127.0.0.1:5672)
=ERROR REPORT==== 10-Feb-2016::22:17:14 ===
closing AMQP connection <0.247.0> (127.0.0.1:55372 -> 127.0.0.1:5672):
{writer,send_failed,{error,timeout}}
The error occurs exactly when the Celery workers are getting their connection reset.
I thought this was an AMQP Heartbeat issue, so I've added BROKER_HEARTBEAT = 15 to my Celery settings, but it did not make any difference.
I was having a similar issue with Celery on Windows with long running
tasks with concurrency=1. The following configuration finally worked for
me:
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
I also started the celery worker daemon with the -Ofair option:
celery -A test worker -l info -Ofair
In my limited understanding, CELERYD_PREFETCH_MULTIPLIER sets the number
of messages that sit in the queue of a specific Celery worker. By
default it is set to 4. If you set it to 1, each worker will only
consume one message and complete the task before it consumes another
message. I was having issues with long-running task because the
connection to RabbitMQ was consistently lost in the middle of the long task, but
then the task was re-attempted if any other message/tasks were waiting
in the celery queue.
The following option was also specific to my situation:
CELERYD_CONCURRENCY = 1
Setting concurrency to 1 made sense for me because I had long running
tasks that needed a large amount of RAM so they each needed to run solo.
#bbaker solution with CELERY_ACKS_LATE (which is task_acks_late in celery 4x) itself did not work for me. My workers are in Kubernetes pods and must be run with --pool solo and each task takes 30-60s.
I solved it by including broker_heartbeat=0
broker_pool_limit = None
task_acks_late = True
broker_heartbeat = 0
worker_prefetch_multiplier = 1

RabbitMQ not closing old connections with Celery

I use Celery with Django to consume/publish tasks to RabbitMQ from ~20 workers across a few datacenters. After about a month or so, I'm at 8000 open socket descriptors and the number keeps increasing until I restart RabbitMQ. Often I "kill -9" the Celery worker process instead of shutting them down since I do not want to wait for jobs to finish. On the workers I do not see the connections that RabbitMQ is showing. Is there a way to purge the old connections from RabbitMQ?
I'm using Celery 3.1.13 and RabbitMQ 3.2.4, all on Ubuntu 14.04. I'm not using librabbitmq, but pyamqp.
I was getting the same issue with the following 3-machine setup:
Worker (Ubuntu 14.04)
amqp==1.4.6
celery==3.1.13
kombu==3.0.21
Django App Server (Ubuntu 14.04)
amqp==1.4.2
celery==3.1.8
kombu==3.0.10
Rabbit MQ Server (Ubuntu 14.04 | rabbitmq-server 3.2.4)
Each task the worker received opened one connection that never closed (according to the RabbitMQ log) and consumed ~2-3 MB of memory.
I have since upgraded Celery to the latest version on my Django server and the socket descriptors and memory usage are holding steady.
I also see the connections close in the RabbitMQ log after the task completes, like so:
closing AMQP connection <0.12345.0> (192.168.1.100:54321 -> 192.168.1.100:5672):
connection_closed_abruptly
Use BROKER_HEARTBEAT in Django's settings.py file.
RabbitMQ expects this value from the client(Celery in this case).
Refer to
http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-broker_heartbeat for more details.

Jobs firing to wrong celery

I am using django celery and rabbitmq as my broker (guest rabbit user has full access on local machine). I have a bunch of projects all in their own virtualenv but recently needed celery on 2 of them. I have one instance of rabbitmq running
(project1_env)python manage.py celery worker
normal celery stuff....
[Configuration]
broker: amqp://guest#localhost:5672//
app: default:0x101bd2250 (djcelery.loaders.DjangoLoader)
[Queues]
push_queue: exchange:push_queue(direct) binding:push_queue
In my other project
(project2_env)python manage.py celery worker
normal celery stuff....
[Configuration]
broker: amqp://guest#localhost:5672//
app: default:0x101dbf450 (djcelery.loaders.DjangoLoader)
[Queues]
job_queue: exchange:job_queue(direct) binding:job_queue
When I run a task in project1 code it fires to the project1 celery just fine in the push_queue. The problem is when I am working in project2 any task tries to fire in the project1 celery even if celery isn't running on project1.
If I fire back up project1_env and start celery I get
Received unregistered task of type 'update-jobs'.
If I run list_queues in rabbit, it shows all the queues
...
push_queue 0
job_queue 0
...
My env settings and CELERYD_CHDIR and CELERY_CONFIG_MODULE are both blank.
Some things I have tried:
purging celery
force_reset on rabbitmq
rabbitmq virtual hosts as outlined in this answer: Multi Celery projects with same RabbitMQ broker backend process
moving django celery setting out and setting CELERY_CONFIG_MODULE to the proper settings
setting the CELERYD_CHDIR in both projects to the proper directory
None of these thing have stopped project2 tasks trying to work in project1 celery.
I am on Mac, if that makes a difference or helps.
UPDATE
Setting up different virtual hosts made it all work. I just had it configured wrong.
If you're going to be using the same RabbitMQ instance for both Celery instances, you'll want to use virtual hosts. This is what we use and it works. You mention that you've tried it, but your broker URLs are both amqp://guest#localhost:5672//, with no virtual host specified. If both Celery instances are connected to the same host and virtual host, they will produce to and consume from the same set of queues.

[[Django Celery]] Celery blocked doing IO tasks

I use celery to do some IO tasks, such as grab remote image, sending email to users.
But celery sometimes blocked with no logs. At this time, it won't do any task i send. I have to restart it, it begin to work where it blocked.
It puzzles me for a very long time. What can i do ? And what is the best practice for distributing IO tasks with celery?
By default, celery worker fork several processes waiting for tasks request from client.
For the tasks of IO pending and your system need a larger number of concurrency that handle
request concurrently. Here is the command:
celery -A tasks worker --without-heartbeat -P threads --concurrency=10
If simutanelous income requests is a lot, your concurrency level have to set higher than the size of incoming reqeust burst.
The system's performance may be limited by the hardware memeory size or OS's select API.
You can use celery's thread/ gevent model when concurrency is large:
celery -A tasks worker --without-heartbeat -P threads --concurrency=1000
or
celery -A tasks worker --without-heartbeat -P gevent --concurrency=1000
you can increase the celery concurrency
manage.py celeryd --concurrency=3
where concurrency == number of processors
run shell command
grep -c processor /proc/cpuinfo
to get number of processors on your machine

Celery and Redis keep running out of memory

I have a Django app deployed to Heroku, with a worker process running celery (+ celerycam for monitoring). I am using RedisToGo's Redis database as a broker. I noticed that Redis keeps running out of memory.
This is what my procfile looks like:
web: python app/manage.py run_gunicorn -b "0.0.0.0:$PORT" -w 3
worker: python lipo/manage.py celerycam & python app/manage.py celeryd -E -B --loglevel=INFO
Here's the output of KEYS '*':
"_kombu.binding.celeryd.pidbox"
"celeryev.643a99be-74e8-44e1-8c67-fdd9891a5326"
"celeryev.f7a1d511-448b-42ad-9e51-52baee60e977"
"_kombu.binding.celeryev"
"celeryev.d4bd2c8d-57ea-4058-8597-e48f874698ca"
`_kombu.binding.celery"
celeryev.643a99be-74e8-44e1-8c67-fdd9891a5326 is getting filled up with these messages:
{"sw_sys": "Linux", "clock": 1, "timestamp": 1325914922.206671, "hostname": "064d9ffe-94a3-4a4e-b0c2-be9a85880c74", "type": "worker-online", "sw_ident": "celeryd", "sw_ver": "2.4.5"}
Any idea what I can do to purge these messages periodically?
Is that a solution?
in addition to _kombu.bindings.celeryev set there will be e.g. celeryev.i-am-alive. keys with TTL set (e.g. 30sec);
celeryev process adds itself to bindings and periodically (e.g. every 5 sec) updates the celeryev.i-am-alive. key to reset the TTL;
before sending the event worker process checks not only smembers on _kombu.bindings.celeryev but the individual celeryev.i-am-alive. keys as well and if key is not found (expired) then it gets removed from _kombu.bindings.celeryev (and maybe the del celeryev. or expire celeryev. commands are executed).
we can't just use keys command because it is O(N) where N is the total number of keys in DB. TTLs can be tricky on redis < 2.1 though.
expire celeryev. instead of del celeryev. can be used in order to allow temporary offline celeryev consumer to revive, but I don't know if it worths it.
author