Celery throws connection reset by peer error - python-2.7

I have a kombu consumer which sends a task periodically to my celery application(using rabbitmq backend) as such:
celery_app.send_task('fetch_data', args=[], kwargs={})
The celery app receives the task and is able to execute it. However after a while when it receives another task in a similar fashion, I get a
[Errno 104] Connection reset by peer
In the rabbitmq logs I see a lot of heartbeat missed warnings.
I have already tried setting
broker_pool_limit = None
in my celery app, but it did not solve the problem.
How do I rectify this?

Related

504 gateway timeout flask socketio

I am working on a flask-socketio server which is getting stuck in a state where only 504s (gateway timeout) are returned. We are using AWS ELB in front of the server. I was wondering if anyone wouldn't mind giving some tips as to how to debug this issue.
Other symptoms:
This problem does not occur consistently, but once it begins happening, only 504s are received from requests. Restarting the process seems to fix the issue.
When I run netstat -nt on the server, I see many entries with rec-q's of over 100 stuck in the CLOSE_WAIT state
When I run strace on the process, I only see select and clock_gettime
When I run tcpdump on the server, I can see the valid requests coming into the server
AWS health checks are coming back succesfully
EDIT:
I should also add two things:
flask-socketio's server is used for production (not gunicorn or uWSGI)
Python's daemonize function is used for daemonizing the app
It seemed that switching to gunicorn as the wsgi server fixed the problem. This legitimately might be an issue with the flask-socketio wsgi server.

Cannot connect Celery to RabbitMQ on Windows Server

I am trying to setup rabbitMQ to use as a message broker for Celery. I am trying to set these up on a Windows Server 2012 R2. After I start the rabbitMQ server using the RabbitMQ start service on the applications menu, I try to start the celery app with the command.
celery -A proj worker -l info
I get the following error after the above command.
[2018-01-09 10:03:02,515: ERROR/MainProcess] consumer: Cannot connect to amqp://
guest:**#127.0.0.1:5672//: [WinError 10042] An unknown, invalid, or unsupported
option or level was specified in a getsockopt or setsockopt call.
Trying again in 2.00 seconds...
So, I tried debugging, by check the status of the RabbitMQ server, for which I went into the RabbitMQ command prompt and typed rabbitmqctl status, on which I got the following response.
These are the services that I used to start RabbitMQ and the RabbitMQ command line
Here's my Django settings for Celery. I tried putting ports and usernames before and after the hosts, but same error.
CELERY_BROKER_URL = 'amqp://localhost//'
CELERY_RESULT_BACKEND = 'amqp://localhost//'
What is the issue here? How do I check if the RabbitMQ service started or not? What setting do I need to put on the Django Settings file.
I was fighting the same issue. Ended up downgrading amqp to 2.1.3 based on the open issue in py-amqp:
https://github.com/celery/py-amqp/issues/130
Uninstall amqp using pip uninstall amqp
Install amqp using pip install -Iv amqp==2.1.3

Celery and RabbitMQ timeouts and connection resets

I'm using RabbitMQ 3.6.0 and Celery 3.1.20 on a Windows 10 machine in a Django application. Everything is running on the same computer. I've configured Celery to Acknowledge Late (CELERY_ACKS_LATE=True) and now I'm getting connection problems.
I start the Celery worker, and after 50-60 seconds of handling tasks each worker thread fails with the following message:
Couldn't ack ###, reason:ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)
(### is the number of the task)
When I look at the RabbitMQ logs I see this:
=INFO REPORT==== 10-Feb-2016::22:16:16 ===
accepting AMQP connection <0.247.0> (127.0.0.1:55372 -> 127.0.0.1:5672)
=INFO REPORT==== 10-Feb-2016::22:16:16 ===
accepting AMQP connection <0.254.0> (127.0.0.1:55373 -> 127.0.0.1:5672)
=ERROR REPORT==== 10-Feb-2016::22:17:14 ===
closing AMQP connection <0.247.0> (127.0.0.1:55372 -> 127.0.0.1:5672):
{writer,send_failed,{error,timeout}}
The error occurs exactly when the Celery workers are getting their connection reset.
I thought this was an AMQP Heartbeat issue, so I've added BROKER_HEARTBEAT = 15 to my Celery settings, but it did not make any difference.
I was having a similar issue with Celery on Windows with long running
tasks with concurrency=1. The following configuration finally worked for
me:
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
I also started the celery worker daemon with the -Ofair option:
celery -A test worker -l info -Ofair
In my limited understanding, CELERYD_PREFETCH_MULTIPLIER sets the number
of messages that sit in the queue of a specific Celery worker. By
default it is set to 4. If you set it to 1, each worker will only
consume one message and complete the task before it consumes another
message. I was having issues with long-running task because the
connection to RabbitMQ was consistently lost in the middle of the long task, but
then the task was re-attempted if any other message/tasks were waiting
in the celery queue.
The following option was also specific to my situation:
CELERYD_CONCURRENCY = 1
Setting concurrency to 1 made sense for me because I had long running
tasks that needed a large amount of RAM so they each needed to run solo.
#bbaker solution with CELERY_ACKS_LATE (which is task_acks_late in celery 4x) itself did not work for me. My workers are in Kubernetes pods and must be run with --pool solo and each task takes 30-60s.
I solved it by including broker_heartbeat=0
broker_pool_limit = None
task_acks_late = True
broker_heartbeat = 0
worker_prefetch_multiplier = 1

RabbitMQ not closing old connections with Celery

I use Celery with Django to consume/publish tasks to RabbitMQ from ~20 workers across a few datacenters. After about a month or so, I'm at 8000 open socket descriptors and the number keeps increasing until I restart RabbitMQ. Often I "kill -9" the Celery worker process instead of shutting them down since I do not want to wait for jobs to finish. On the workers I do not see the connections that RabbitMQ is showing. Is there a way to purge the old connections from RabbitMQ?
I'm using Celery 3.1.13 and RabbitMQ 3.2.4, all on Ubuntu 14.04. I'm not using librabbitmq, but pyamqp.
I was getting the same issue with the following 3-machine setup:
Worker (Ubuntu 14.04)
amqp==1.4.6
celery==3.1.13
kombu==3.0.21
Django App Server (Ubuntu 14.04)
amqp==1.4.2
celery==3.1.8
kombu==3.0.10
Rabbit MQ Server (Ubuntu 14.04 | rabbitmq-server 3.2.4)
Each task the worker received opened one connection that never closed (according to the RabbitMQ log) and consumed ~2-3 MB of memory.
I have since upgraded Celery to the latest version on my Django server and the socket descriptors and memory usage are holding steady.
I also see the connections close in the RabbitMQ log after the task completes, like so:
closing AMQP connection <0.12345.0> (192.168.1.100:54321 -> 192.168.1.100:5672):
connection_closed_abruptly
Use BROKER_HEARTBEAT in Django's settings.py file.
RabbitMQ expects this value from the client(Celery in this case).
Refer to
http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-broker_heartbeat for more details.

Connection Error with Django/Celery and CloudAMQP/Heroku

I have a Django app that I've already deployed to Heroku. This app uses Celery for message queuing and I've run it locally using RabbitMQ without incident.
Unfortunately, when I went to deploy this baby to Heroku, I found that the RabbitMQ addon wasn't available and that I'd have to use CloudAMQP. The documentation for both CloudAMQP and Heroku lead me to believe that I can use Celery (even though they recommend Pika), but when I try to deploy, I get gnarly connection errors for both my scheduler and worker processes. Here are the exact errors:
2012-07-09T16:46:22+00:00 app[scheduler.1]: [2012-07-09 11:46:22,234: ERROR/Beat] Celerybeat: Connection error: [Errno 111] Connection refused. Trying again in 2.0 seconds...
2012-07-09T16:46:23+00:00 app[worker.1]: [2012-07-09 11:46:23,852: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 2 seconds...
I should note that my Heroku config vars DO have a CLOUDAMQP_URL, so that shouldn't be a problem?
I would appreciate it if anyone who has used CloudAMQP with Django/Heroku could give me some guidance about how to make sure that Celery can connect with the broker.
You're probably exceeding the 3 concurrent connections limit of the free plan. Set the BROKER_POOL_LIMIT to 1 and it should work a lot better.
Make sure that you have this at the top of your settings.py file.
import djcelery
djcelery.setup_loader()