Lots of socket errors with celery eventlet tasks - django

I'm getting a lot of "IOError: Socket closed" exceptions from amqplib.client_0_8.method_framing.read_method when running my celery workers with the --pool=eventlet option. I'm also seeing a lot of timeout exceptions from eventlet.hubs.hub.switch.
I'm using an async_manage.py script similar to the one at https://gist.github.com/821848, running the works like:
./async_manage.py celeryd_detach -E --pool=eventlet --concurrency=120 --logfile=<path>
Is this a known issue, or is there something wrong with my configuration or setup?
I'm running djcelery 2.2.4, Django 1.3, and eventlet 0.9.15.

The problem was a side effect of some code that was blocking. I managed to detect the blocking code using the eventlet option described in this article.
There were 2 places where blocking was occuring: DNS lookups, and MySQL database access. I managed to resolve the first by installing the dnspython package, and the second my using the undocumented MySQLdb option in eventlet:
import eventlet
eventlet.monkey_patch()
eventlet.monkey_patch(MySQLdb=True)

Related

APScheduler running multiple times for the amount of gunicorn workers

I have a django project with APScheduler built in it. I have proceeded to the production environment now so binded it with gunicorn and nginx in the proceess. Gunicorn has 3 workers. Problem is that gunicorn initiates the APScheduler for each worker and runs the scheduled job 3 times instead of running it for only once.
I have seen similar questions here it seems it is a common problem. Even the APScheduler original documentation acknowledges the problem and tells no way of fixing it.
https://apscheduler.readthedocs.io/en/stable/faq.html#how-do-i-share-a-single-job-store-among-one-or-more-worker-processes
I saw in other threads people recommended putting --preconfig in the settings. But I read that --preconfig initiates the workers with the current code and does not reload when there has been a change in the code.(See "when not to preload" in below link)
https://www.joelsleppy.com/blog/gunicorn-application-preloading/
I also saw someone recommended binding a TCP socket for the APScheduler. I did not understand it fully but basically it was trying to bind a socket each time APScheduler is initiated then the second and third worker hits that binded socket and throws a socketerror. Sort of
try:
"bind socket somehow"
except socketerror:
print("socket already exists")"
else:
"run apscheduler module"
configuration. Does anyone know how to do it or know if that would actually work?
Another workaround I thought is simply removing the APScheduler and do it with cron function of the server. I am using Digital Ocean so I can simply delete the APScheduler and a cron function that will run the module instead. However, I do not want to go that way because that will make break the "unity" of the whole project and make it server dependable. Does anyone have any more ideas?
Schedule module:
from apscheduler.schedulers.background import BackgroundScheduler
from RENDER.views import dailypuzzlefunc
def start():
scheduler=BackgroundScheduler()
scheduler.add_job(dailypuzzlefunc,'cron', day="*",max_instances=2,id='dailyscheduler')
scheduler.start()
In the app:
from django.apps import AppConfig
class DailypuzzleConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "DAILYPUZZLE"
def ready(self):
from SCHEDULER import dailypuzzleschedule
dailypuzzleschedule.start()
web:
python manage.py collectstatic --no-input;
gunicorn MasjidApp.wsgi --timeout 15 --preload
use --preload.
It's working well for me.

Can I use Celery eventlet pools with Django+Postgres?

Does celery do any magic to make Django queries non-blocking when using an eventlet pool?
If not, is there a known good way to make it so?
Eventlet provides monkey_patch() to make as much stuff as possible non-blocking. Including sockets (covers any pure Python database library) and special cases for mysqldb and psycopg and Celery worker type eventlet calls that patcher as far as I know. If your queries are still blocking, try monkey_patch(psycopg=True).

Getting broker started with django-celery

This is my first time using Celery so this might be a really easy question. I'm following the tutorial. I added BROKER_URL = "amqp://guest:guest#localhost:5672/" to my settings file. I added the simple task to my app. Now I do "ing the worker process" with
manage.py celeryd --loglevel=info --settings=settings
The settings=settings is needed for windows machines celery-django can't find settings.
I get
[Errno 10061] No connection could be made because the target machine actively refused it. Trying again in 2 seconds...
So it seems like the worker is not able to connect to the broker. Do I have to start the broker? Is it automatically started with manage.py runserver? Do I have to install something besides django-celery? Do I have to do something like manage.py runserver BROKER_URL?
Any pointers would be much appreciated.
You need to install broker first. Or try to use django db.
But i do not recommend use django db in production. Redis is ok. But it maybe problem run it on windows.

Does django's runserver option provide a hook for running other restart scripts?

I've recently been playing around with django and celery. One annoying thing during development is the fact that I have to restart the celery daemon each time I modify a task. When I'm developing, I usually like to use 'manage.py runserver' which automatically reloads the django framework on modifications to my apps.
Is there a way to add a hook to the reloading process that runserver does so that it automatically restarts the celery daemon I have running?
Alternatively, does celery have a similar monitor-and-reload-on-change mode that I should be using for development?
Django-supervisor works very well for this purpose. You can have it start the Django server, Celery, and anything else you need, and have different configurations for development and production servers. It also knows to reload the celery daemon when your code changes.
https://github.com/rfk/django-supervisor
I believe you can set CELERY_ALWAYS_EAGER to true.
Yes. Django provides auto reload hook, which can be used to restart other scripts.
Here is a simple management command which prints a message on reload
import subprocess
from django.core.management.base import BaseCommand
from django.utils import autoreload
def reload():
print('Code changed. Auto reloading...')
class Command(BaseCommand):
def handle(self, *args, **options):
autoreload.main(reload)
Now you can save to a reload.py and run it with python manage.py reload. A management command to reload celery workers is available here.
Celery didn't have any feature for reload code or for auto restart when the code change, than you have to restart it manually.
There isn't a way for add an hook, and I think not worthwhile of edit the source code of django just for perform a restart.
Personally while I'm developing i prefere to see the output shell of celery that is decorated with color instead of tail the logs, is more readable.
Celery 2.5 has an experimental runtime option --autoreload that could be used for this purpose, too. Here's more detail in the release notes. That being said, I think django-supervisor (via #Lee Semel) looks like the better way of doing things. I thought I would post this alternative here in case other readers do not want to have to configure another app for asynchronous processing.

How do I get broadcast to work with djcelery+ghettoq

I am using djcelery 2.1.4 with ghettoq 0.4.5 and django 1.2.3 and I am able to run tasks all day long, but when I try to use any broadcast functionality it fails without errors. Take the simplest case -- I run celeryd:
python manage.py celeryd
The daemon starts and I try to run a ping:
>>> from celery.task.control import ping
>>> ping()
[]
I can see the message that ping created appear in the database, but apparently none of the nodes are picking it up? Am I doing something wrong here? Does broadcast not work with ghettoq?
Broadcast is not supported by ghettoq.
The next Celery version (2.2) will support broadcast for Redis. Adding support for database
should be simple then.