Django with Celery on Digital Ocean - django

The Objective
I am trying to use Celery in combination with Django; The objective is to set up Celery on a Django web application (deployed test environment) to send scheduled emails. The web application already sends emails. The ultimate objective is to add functionality to send out emails at a user-selected date-time. However, before we get there the first step is to invoke the delay() function to prove that Celery is working.
Tutorials and Documentation Used
I am new to Celery and have been learning through the following resources:
First Steps With Celery-Django documentation: https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#using-celery-with-django
A YouTube video on sending email from Django through Celery via a Redis broker: https://www.youtube.com/watch?v=b-6mEAr1m-A
The Redis/Celery droplet was configured per the following tutorial https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-redis-on-ubuntu-20-04
I have spent several days reviewing existing Stack Overflow questions on Django/Celery, and tried a number of suggestions. However, I have not found a question specifically describing this effect in the Django/Celery/Redis/Digital Ocean context. Below is described the current situation.
What Is Currently Happening?
The current outcome, as of this post, is that the web application times out, suggesting that the Django app is not successfully connecting with the Celery to send the email. Please note that towards the bottom of the post is the output of the Celery worker being successfully started manually from within the Django app's console, including a listing of the expected tasks.
The Stack In Use
Python 3.11 and Django 4.1.6: Running on the Digital Ocean App platform
Celery 5.2.7 and Redis 4.4.2 on Ubuntu 20.04: Running on a separate Digital Ocean Droplet
The Django project name is, "Whurthy".
Celery Setup Code Snippets
The following snippets are primarily from the Celery-Django documentation: https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#using-celery-with-django
Whurthy/celery.py
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'Whurthy.settings')
app = Celery('Whurthy')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
Whurthy/__init__.py
from .celery import app as celery_app
__all__ = ('celery_app',)
Application Specific Code Snippets
Whurthy/settings.py
CELERY_BROKER_URL = 'redis://SNIP_FOR_PRIVACY:6379'
CELERY_RESULT_BACKEND = 'redis://SNIP_FOR_PRIVACY:6379'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = TIME_ZONE
I have replaced the actual IP with the string SNIP_FOR_PRIVACY for obvious reasons. However, if this were incorrect I would not get the output below.
I have also commented out the bind and requirepass redis configuration settings to support troubleshooting during development. This makes the URL as simple as possible and rules out either the incoming IP or password as being the cause of this problem.
'events/tasks.py`
from celery import shared_task
from django.core.mail import send_mail
#shared_task
def send_email_task():
send_mail(
'Celery Task Worked!',
'This is proof the task worked!',
'notifications#domain.com',
['my_email#domain.com'],
)
return
For privacy reasons I have changed the to and from email addresses. However, please note that this function works before adding .delay() to the following snippet. In other words, the Django app sends an email up until I add .delay() to invoke Celery.
events/views.py (extract)
from .tasks import send_email_task
from django.shortcuts import render
def home(request):
send_email_task.delay()
return render(request, 'home.html', context)
The above is just the relevant extract of a larger file to show the specific line of code calling the function. The Django web application is working until delay() is appended to the function call, and so I have not included other Django project file snippets.
Output from Running celery -A Whurthy worker -l info in the Digital Ocean Django App Console
Ultimately, I want to Dockerize this command, but for now I am running the above command manually. Below is the output within the Django App console, and it appears consistent with the tutorial and other examples of what a successfully configured Celery instance would look like.
<SNIP>
-------------- celery#whurthy-staging-b8bb94b5-xp62x v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-4.4.0-x86_64-with-glibc2.31 2023-02-05 11:51:24
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: Whurthy:0x7f92e54191b0
- ** ---------- .> transport: redis://SNIP_FOR_PRIVACY:6379//
- ** ---------- .> results: redis://SNIP_FOR_PRIVACY:6379/
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. Whurthy.celery.debug_task
. events.tasks.send_email_task
This appears to confirm that the Digital Ocean droplet is starting up a Celery worker successfully (suggesting that the code snippets above are correct) and that Redis configuration is correct. The two tasks listed when starting Celery is consistent with expectations. However, I am clearly missing something, and cannot rule out that the way Digital Ocean runs droplets is getting in the way.
The baseline test is that the web application sends out an email through the function call. However, as soon as I add .delay() the web page request times out.
I have endeavoured to replicate all that is relevant. I welcome any suggestions to resolve this issue or constructive criticism to improve this question.
Troubleshooting Attempts
Attempt 1
Through the D.O. app console I ran python manage.py shell
I then entered the following into the shell:
>>> from events.tasks import send_email_task
>>> send_email_task
<#task: events.tasks.send_email_task of Whurthy at 0x7fb2f2348dc0>
>>> send_email_task.delay()
At this point the shell hangs/does not respond until I keyboard interrupt.
I then tried the following:
>>> send_email_task.apply()
<EagerResult: 90b7d92c-4f01-423b-a16f-f7a7c75a545c>
AND, the task sends an email!
So, the connection between Django-Redis-Celery appears to work. However, invoking delay() causes the web app to time out and the email to NOT be sent.
So either delay() isn't putting the task in the queue, or is getting stuck. But in either case, this does not appear to be a connection issue. However, because apply() runs the code in the thread of the caller this isn't resolving the issue.
Which does suggest this may be an issue with the broker. This in turn may be an issue with settings...
Made minor changes to broker settings in settings.py
CELERY_BROKER_URL = 'redis://SNIP_FOR_PRIVACY:6379/0'
CELERY_RESULT_BACKEND = 'redis://SNIP_FOR_PRIVACY:6379/1'
delay() still hangs in the shell.
Attempt 2
I discovered that in Digital Ocean the ipv4 does not work when used for the Broker URL. By replacing that with the private IP in the CELERY_BROKER_URL setting I was able to get delay() working within the Django app's shell.
However, while I can now get delay() working in the shell returning to the original objective still fails. In other words, when loading in the respective view the web application hangs.
I am currently researching other approaches. Any suggestions are welcome. Given that I can now get Celery to work through the broker in the shell but not in the web application I feel like I have made some progress but am still out of a solution.
As a side note, I am also trying to make this connection through a Digital Ocean Managed Redis DB, although that is presenting a completely different issue.

Ultimately, the answer I uncovered is a compromise, a workaround using a different Digital Ocean (D.O.) product. The workaround was to use a Managed Database (which simplifies things but gives you much less control) rather than a Droplet (which involves manual Linux/Redis installation and configuration, but gives you greater control). This isn't ideal for 2 reasons. First, it costs more ($6 vs $15 base cost). Second, I would have preferred to be able to work out how to manually setup Redis (and thus maintain greater control). However, I'll take a working solution over no solution for a very niche issue.
The steps to use a D.O. Managed Redis DB are:
Provision the managed Redis DB
Use the Public Network Connection String (as the connection string includes the password I store this in an environment variable)
Ensure that you have the appropriate ssl setting in the 'celery.py' file (snippet below)
celery.py
import os
from celery import Celery
from ssl import CERT_NONE
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj_name.settings')
app = Celery(
'proj_name',
broker_use_ssl={'ssl_cert_reqs': ssl.CERT_NONE},
redis_backend_use_ssl={'ssl_cert_reqs': ssl.CERT_NONE}
)
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
settings.py
REDIS_URI = os.environ.get('REDIS_URI')
CELERY_BROKER_URL = f'{REDIS_URI}/0'
CELERY_RESULT_BACKEND = f'{REDIS_URI}/1'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = TIME_ZONE

Related

APScheduler running multiple times for the amount of gunicorn workers

I have a django project with APScheduler built in it. I have proceeded to the production environment now so binded it with gunicorn and nginx in the proceess. Gunicorn has 3 workers. Problem is that gunicorn initiates the APScheduler for each worker and runs the scheduled job 3 times instead of running it for only once.
I have seen similar questions here it seems it is a common problem. Even the APScheduler original documentation acknowledges the problem and tells no way of fixing it.
https://apscheduler.readthedocs.io/en/stable/faq.html#how-do-i-share-a-single-job-store-among-one-or-more-worker-processes
I saw in other threads people recommended putting --preconfig in the settings. But I read that --preconfig initiates the workers with the current code and does not reload when there has been a change in the code.(See "when not to preload" in below link)
https://www.joelsleppy.com/blog/gunicorn-application-preloading/
I also saw someone recommended binding a TCP socket for the APScheduler. I did not understand it fully but basically it was trying to bind a socket each time APScheduler is initiated then the second and third worker hits that binded socket and throws a socketerror. Sort of
try:
"bind socket somehow"
except socketerror:
print("socket already exists")"
else:
"run apscheduler module"
configuration. Does anyone know how to do it or know if that would actually work?
Another workaround I thought is simply removing the APScheduler and do it with cron function of the server. I am using Digital Ocean so I can simply delete the APScheduler and a cron function that will run the module instead. However, I do not want to go that way because that will make break the "unity" of the whole project and make it server dependable. Does anyone have any more ideas?
Schedule module:
from apscheduler.schedulers.background import BackgroundScheduler
from RENDER.views import dailypuzzlefunc
def start():
scheduler=BackgroundScheduler()
scheduler.add_job(dailypuzzlefunc,'cron', day="*",max_instances=2,id='dailyscheduler')
scheduler.start()
In the app:
from django.apps import AppConfig
class DailypuzzleConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "DAILYPUZZLE"
def ready(self):
from SCHEDULER import dailypuzzleschedule
dailypuzzleschedule.start()
web:
python manage.py collectstatic --no-input;
gunicorn MasjidApp.wsgi --timeout 15 --preload
use --preload.
It's working well for me.

Django Postgresql Heroku : Operational Error - 'FATAL too many connections for role "usename"'

I am running a web application using Django and Django Rest Framework on Heroku with a postgresql and redis datastore. I am on the free postgresql tier which is limited to 20 connections.
This hasn't been an issue in the past, but recently I started using django channels 2.0 and the daphne server (switched my Procfile from gunicorn to daphne like this tutorial) and now I have been running into all sort of weird problems.
The most critical is that connections to the database are being left open so as the app runs, the number of connections keep increasing until it reaches 20 and gives me the following error message: Operational Error - 'FATAL too many connections for role "usename"'
Then I have to manually go to shell and type heroku pg:killall each time, this is obviously not a feasible solution and this is production so my users cant get access to site and get 500 errors. Would really appreciate any help.
I have tried:
Adding this to my different views in different places
from django.db import connections
conections.close_all()
for con in connections:
con.close()
I also tried doing SELECT * from pg_activity and saw a bunch of stuff but have no idea what to make of it:
We figured out whats the problem. I assume that you are using dj_database_url like in heroku manual. All you have to do is to drop conn_max_age.
db_from_env = dj_database_url.config()
There is the solution:
Nowadays heroku provide the django_heroku package that deal with default django-heroku app configuration, so when you call django_heroku.config(locals()) on the end of your settings.py the default CONN_MAX_AGE database config is set to 600 seconds, so the default of django is 0 what mean all database connections are been closed after request complete, if you don't replace the value of CONN_MAX_AGE after calling django_heroku.config(locals()) the value of this field is default to 600 what mean the DB connections still alive for 600 seconds causing this trouble.
Put this line on the end of your settings.py, its mandatory to be after heroku config:
django_heroku.config(locals())
DATABASES['default']['CONN_MAX_AGE'] = 0
I think I may have solved it.
One of the changes I made was modifying how I closed my connections.
The key is to close old connections before and after various view functions.
from django.db import close_old_connections
#csrf_exempt
#api_view(['GET', ])
def search(request):
close_old_connections()
# do stuff
close_old_connections()

My celery task never returns

I am just starting to learn about Django and have just discovered celery to run async background tasks.
I have a dummy project which I pilfered off the internet with a sample task as follows:
from djcelery import celery
from time import sleep
#celery.task
def sleeptask(i):
sleep(i)
return i
Now in my view, I have the following:
def test_celery(request):
result = tasks.sleeptask.delay(10)
return HttpResponse(result.task_id)
This runs fine and when I point the browser to it, I get some random string like 93463e9e-d8f5-46b2-8544-8d4b70108b0d which I am guessing is the task id.
However, when I do this:
def test_celery(request):
result = tasks.sleeptask.delay(10)
return HttpResponse(result.get())
The web browser goes in a loop with the message "Connecting..." and never returns. I was under the impression, this will block till the task is run and give the result but that does not seem to be the case. What am I doing wrong?
Another question is the way I am doing it, is it going to run it asynchronously i.e. not block while the task is running?
EDIT
In my settings.py file I have:
import djcelery
# Setup celery
djcelery.setup_loader()
BROKER_URL = 'redis://localhost:6379/0'
On the Django side, I do not get any errors:
System check identified no issues (0 silenced).
September 27, 2016 - 18:13:12
Django version 1.9.5, using settings 'myproject.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Thanks to the hints in the comments, I finally was able to solve the problem. I had to add the following:
CELERY_IMPORTS('myproject.tasks') to my settings.py file.
I also needed to run the worker as:
python manage.py celery worker

DJCelery not storing task results in Django SQLite DB

DJCelery is not storing task results in my Django SQLite DB.
I have an existing Django project that I have started setting up Celery w/ RabbitMQ on. I started my RabbitMQ server. I can run Celery python manage.py celeryd --verbosity=2 --loglevel=DEBUG and Celerybeat python manage.py celerybeat --verbosity=2 --loglevel=DEBUG. Everything starts up w/ out error and my periodic example tasks also runs without error.
I used pip install django-celery to install. I have djcelery in my installed apps and ran python manage.py migrate djcelery. I added CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend' to the end of my settings.py file.
When I run python manage.py celeryd --verbosity=2 --loglevel=DEBUG, the startup text shows:
...
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results:
- *** --- * --- .> concurrency: 1 (prefork)
...
The results section being blank indicates to me that the configuration isn't right somehow but I can't figure out how. I tried using app.conf.update in my celery.py file to set the CELERY_RESULT_BACKEND but got the same results. I left out CELERY_RESULT_BACKEND, but that defaulted to no results. I also tried putting 'database' instead of 'djcelery.backends.database:DatabaseBackend' but that indicated it was attempting to use sqlalchemy instead of djcelery.
When I run python manage.py runserver I can see a DJCELERY section with tables Crontabs, Intervals, Periodic tasks, Tasks, and Workers. There isn't any data on my Tasks though.
Can anyone point out what could be wrong or missing? Thank you for your time.
tutuDajuju led me in the right direction - there's more to it so I'll write it all up. I abandoned using djcelery in favor of sqlalchemy with a separate back-end database outside of Django.
Inside my venv I ran pip install sqlalchemy. I then put CELERY_RESULT_BACKEND = 'db+sqlite:///celery_results.sqlite3' in settings.py. This connected Celery to the new SQLite database to use for state/results.
Running celery -A <projectapp>.celery:app worker then showed the database in the startup message:
...
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: sqlite:///celery_results.sqlite3
- *** --- * --- .> concurrency: 1 (prefork)
...
At first I was worried because the database file wasn't created in my Django project dir. This is because I hadn't ran a task yet. Once I ran my first task, the database & tables were created correctly.
I verified task results were stored in the database by running a script:
from sqlalchemy import create_engine
engine = create_engine("sqlite:///celery_results.sqlite3")
connection = engine.connect()
result = connection.execute("select * from celery_taskmeta")
for row in result:
print(row)
connection.close()
I found the table names by:
print(engine.table_name())
Hope this helps someone out.
The celery docs mention a few different syntaxes, not sure what you tried is valid. Try the following:
# use a connection string
CELERY_RESULT_BACKEND = 'db+sqlite:///foo.db'
Update:
As in your comment, the docs also mention it is also possible to use the Django ORM/Cache as a result backend. To do this, you must pass the setting you tried into your celery app config:
app.conf.update(
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
)
Alternatively, the docs also explain
If you have connected Celery to your Django settings then you can add
this directly into your settings module (without the app.conf.update
part)
This is a reference to the configuration of the Celery app detailed in the same page. This basically means that if you configured your celery app in a module, and you add the Django settings module as a configuration source for Celery, then setting CELERY_RESULT_BACKEND in your Django settings module, as you did, will also work.
file: proj/proj/celery.py
# important to pass the Django settings to your celery app
app = Celery('proj')
app.config_from_object('django.conf:settings')
file: proj/proj/settings.py
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'

django project with twisted and run as "Daemon"

Last two day I'm trying to find a way to run a working django project under twisted. After detailed searching I found several methods to configure it. But most of them are deal with how to run the app via command line not as a daemon.I want to run the django project as daemon.
I tried following links to implement this,
Twisted: Creating a ThreadPool and then daemonizing leads to uninformative hangs
http://www.robgolding.com/blog/2011/02/05/django-on-twistd-web-wsgi-issue-workaround/
But this also not working for me.By this method TCP server is not even listen to the given port.
Please help me to figure it out.
UPDATE
I'm sorry for the missing informations.Here is my objectives.
I'm beginner in twisted world, so first I'm trying to get my working django project configured under twisted,currently its working well on django testing server or apache via mod_wsgi.
To configure it with twisted I used the biding code given below, that code is a combination of two sample's found in the links that I given in the first post.
So in-order to integrate django app with twisted I used the following code, file name: "server.py".
import sys
import os
from twisted.application import internet, service
from twisted.web import server, resource, wsgi, static
from twisted.python import threadpool
from twisted.internet import reactor
from django.conf import settings
import twresource # This file hold implementation of "Class Root".
class ThreadPoolService(service.Service):
def __init__(self, pool):
self.pool = pool
def startService(self):
service.Service.startService(self)
self.pool.start()
def stopService(self):
service.Service.stopService(self)
self.pool.stop()
class Root(resource.Resource):
def __init__(self, wsgi_resource):
resource.Resource.__init__(self)
self.wsgi_resource = wsgi_resource
def getChild(self, path, request):
path0 = request.prepath.pop(0)
request.postpath.insert(0, path0)
return self.wsgi_resource
PORT = 8080
# Environment setup for your Django project files:
#insert it to first so our project will get first priority.
sys.path.insert(0,"django_project")
sys.path.insert(0,".")
os.environ['DJANGO_SETTINGS_MODULE'] = 'django_project.settings'
from django.core.handlers.wsgi import WSGIHandler
def wsgi_resource():
pool = threadpool.ThreadPool()
pool.start()
# Allow Ctrl-C to get you out cleanly:
reactor.addSystemEventTrigger('after', 'shutdown', pool.stop)
wsgi_resource = wsgi.WSGIResource(reactor, pool, WSGIHandler())
return wsgi_resource
# Twisted Application Framework setup:
application = service.Application('twisted-django')
# WSGI container for Django, combine it with twisted.web.Resource:
# XXX this is the only 'ugly' part: see the 'getChild' method in twresource.Root
wsgi_root = wsgi_resource()
root = Root(wsgi_root)
#multi = service.MultiService()
#pool = threadpool.ThreadPool()
#tps = ThreadPoolService(pool)
#tps.setServiceParent(multi)
#resource = wsgi.WSGIResource(reactor, tps.pool, WSGIHandler())
#root = twresource.Root(resource)
#Admin Site media files
#staticrsrc = static.File(os.path.join(os.path.abspath("."), "/usr/haridas/eclipse_workplace/skgargpms/django/contrib/admin/media/"))
#root.putChild("admin/media", staticrsrc)
# Serve it up:
main_site = server.Site(root)
#internet.TCPServer(PORT, main_site).setServiceParent(multi)
internet.TCPServer(PORT, main_site).setServiceParent(application)
#EOF.
Using above code It worked well from command line using "twisted -ny server.py", but when we run it as daemon "twisted -y server.py" it will hang, but the app is listening to the port 8080. I can access it using telnet.
I found some fixes for this hanging issue from stackoverflow itself. It helped me to use the code sections given below, which is commented in the above server.py file.
multi = service.MultiService()
pool = threadpool.ThreadPool()
tps = ThreadPoolService(pool)
tps.setServiceParent(multi)
resource = wsgi.WSGIResource(reactor, tps.pool, WSGIHandler())
root = twresource.Root(resource)
and :-
internet.TCPServer(PORT, main_site).setServiceParent(multi)
instead of using the:-
wsgi_root = wsgi_resource()
root = Root(wsgi_root)
and :-
internet.TCPServer(PORT, main_site).setServiceParent(application)
The modified method also didn't helped me to avoid the hanging issue.Is any body out there who successfully run the django apps under twisted daemon mode?.
I maid any mistakes while combining these codes?, Currently I'm only started to learn the twisted architectures in detail. Please help me to solve this problem
Thanks and Regards,
Haridas N.
Note:- Im looking for the Twisted Application configuration (TAC) file, which integrate django app with twisted and run with out any problem in the daemon mode also.
Thank you,
Haridas N.
twistd is the Twisted Daemonizer. Anything you run with twistd will be easy to daemonize. All you have to do is not pass the --nodaemon option.
As far as why your code is "not working", you need to provide more details about what you did, what you expected to happen, and how what actually happened differed from your expectations. Otherwise, only a magician can answer your question.
Since you said the TCP port doesn't even get set up, the only guess I can think of is that you're trying to listen on a privileged port (such as 80) without having permissions to do so (ie, you're not root and you're not using authbind or something similar).