My celery task never returns - django

I am just starting to learn about Django and have just discovered celery to run async background tasks.
I have a dummy project which I pilfered off the internet with a sample task as follows:
from djcelery import celery
from time import sleep
#celery.task
def sleeptask(i):
sleep(i)
return i
Now in my view, I have the following:
def test_celery(request):
result = tasks.sleeptask.delay(10)
return HttpResponse(result.task_id)
This runs fine and when I point the browser to it, I get some random string like 93463e9e-d8f5-46b2-8544-8d4b70108b0d which I am guessing is the task id.
However, when I do this:
def test_celery(request):
result = tasks.sleeptask.delay(10)
return HttpResponse(result.get())
The web browser goes in a loop with the message "Connecting..." and never returns. I was under the impression, this will block till the task is run and give the result but that does not seem to be the case. What am I doing wrong?
Another question is the way I am doing it, is it going to run it asynchronously i.e. not block while the task is running?
EDIT
In my settings.py file I have:
import djcelery
# Setup celery
djcelery.setup_loader()
BROKER_URL = 'redis://localhost:6379/0'
On the Django side, I do not get any errors:
System check identified no issues (0 silenced).
September 27, 2016 - 18:13:12
Django version 1.9.5, using settings 'myproject.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Thanks to the hints in the comments, I finally was able to solve the problem. I had to add the following:
CELERY_IMPORTS('myproject.tasks') to my settings.py file.
I also needed to run the worker as:
python manage.py celery worker

Related

Django with Celery on Digital Ocean

The Objective
I am trying to use Celery in combination with Django; The objective is to set up Celery on a Django web application (deployed test environment) to send scheduled emails. The web application already sends emails. The ultimate objective is to add functionality to send out emails at a user-selected date-time. However, before we get there the first step is to invoke the delay() function to prove that Celery is working.
Tutorials and Documentation Used
I am new to Celery and have been learning through the following resources:
First Steps With Celery-Django documentation: https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#using-celery-with-django
A YouTube video on sending email from Django through Celery via a Redis broker: https://www.youtube.com/watch?v=b-6mEAr1m-A
The Redis/Celery droplet was configured per the following tutorial https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-redis-on-ubuntu-20-04
I have spent several days reviewing existing Stack Overflow questions on Django/Celery, and tried a number of suggestions. However, I have not found a question specifically describing this effect in the Django/Celery/Redis/Digital Ocean context. Below is described the current situation.
What Is Currently Happening?
The current outcome, as of this post, is that the web application times out, suggesting that the Django app is not successfully connecting with the Celery to send the email. Please note that towards the bottom of the post is the output of the Celery worker being successfully started manually from within the Django app's console, including a listing of the expected tasks.
The Stack In Use
Python 3.11 and Django 4.1.6: Running on the Digital Ocean App platform
Celery 5.2.7 and Redis 4.4.2 on Ubuntu 20.04: Running on a separate Digital Ocean Droplet
The Django project name is, "Whurthy".
Celery Setup Code Snippets
The following snippets are primarily from the Celery-Django documentation: https://docs.celeryq.dev/en/stable/django/first-steps-with-django.html#using-celery-with-django
Whurthy/celery.py
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'Whurthy.settings')
app = Celery('Whurthy')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
Whurthy/__init__.py
from .celery import app as celery_app
__all__ = ('celery_app',)
Application Specific Code Snippets
Whurthy/settings.py
CELERY_BROKER_URL = 'redis://SNIP_FOR_PRIVACY:6379'
CELERY_RESULT_BACKEND = 'redis://SNIP_FOR_PRIVACY:6379'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = TIME_ZONE
I have replaced the actual IP with the string SNIP_FOR_PRIVACY for obvious reasons. However, if this were incorrect I would not get the output below.
I have also commented out the bind and requirepass redis configuration settings to support troubleshooting during development. This makes the URL as simple as possible and rules out either the incoming IP or password as being the cause of this problem.
'events/tasks.py`
from celery import shared_task
from django.core.mail import send_mail
#shared_task
def send_email_task():
send_mail(
'Celery Task Worked!',
'This is proof the task worked!',
'notifications#domain.com',
['my_email#domain.com'],
)
return
For privacy reasons I have changed the to and from email addresses. However, please note that this function works before adding .delay() to the following snippet. In other words, the Django app sends an email up until I add .delay() to invoke Celery.
events/views.py (extract)
from .tasks import send_email_task
from django.shortcuts import render
def home(request):
send_email_task.delay()
return render(request, 'home.html', context)
The above is just the relevant extract of a larger file to show the specific line of code calling the function. The Django web application is working until delay() is appended to the function call, and so I have not included other Django project file snippets.
Output from Running celery -A Whurthy worker -l info in the Digital Ocean Django App Console
Ultimately, I want to Dockerize this command, but for now I am running the above command manually. Below is the output within the Django App console, and it appears consistent with the tutorial and other examples of what a successfully configured Celery instance would look like.
<SNIP>
-------------- celery#whurthy-staging-b8bb94b5-xp62x v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-4.4.0-x86_64-with-glibc2.31 2023-02-05 11:51:24
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: Whurthy:0x7f92e54191b0
- ** ---------- .> transport: redis://SNIP_FOR_PRIVACY:6379//
- ** ---------- .> results: redis://SNIP_FOR_PRIVACY:6379/
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
[tasks]
. Whurthy.celery.debug_task
. events.tasks.send_email_task
This appears to confirm that the Digital Ocean droplet is starting up a Celery worker successfully (suggesting that the code snippets above are correct) and that Redis configuration is correct. The two tasks listed when starting Celery is consistent with expectations. However, I am clearly missing something, and cannot rule out that the way Digital Ocean runs droplets is getting in the way.
The baseline test is that the web application sends out an email through the function call. However, as soon as I add .delay() the web page request times out.
I have endeavoured to replicate all that is relevant. I welcome any suggestions to resolve this issue or constructive criticism to improve this question.
Troubleshooting Attempts
Attempt 1
Through the D.O. app console I ran python manage.py shell
I then entered the following into the shell:
>>> from events.tasks import send_email_task
>>> send_email_task
<#task: events.tasks.send_email_task of Whurthy at 0x7fb2f2348dc0>
>>> send_email_task.delay()
At this point the shell hangs/does not respond until I keyboard interrupt.
I then tried the following:
>>> send_email_task.apply()
<EagerResult: 90b7d92c-4f01-423b-a16f-f7a7c75a545c>
AND, the task sends an email!
So, the connection between Django-Redis-Celery appears to work. However, invoking delay() causes the web app to time out and the email to NOT be sent.
So either delay() isn't putting the task in the queue, or is getting stuck. But in either case, this does not appear to be a connection issue. However, because apply() runs the code in the thread of the caller this isn't resolving the issue.
Which does suggest this may be an issue with the broker. This in turn may be an issue with settings...
Made minor changes to broker settings in settings.py
CELERY_BROKER_URL = 'redis://SNIP_FOR_PRIVACY:6379/0'
CELERY_RESULT_BACKEND = 'redis://SNIP_FOR_PRIVACY:6379/1'
delay() still hangs in the shell.
Attempt 2
I discovered that in Digital Ocean the ipv4 does not work when used for the Broker URL. By replacing that with the private IP in the CELERY_BROKER_URL setting I was able to get delay() working within the Django app's shell.
However, while I can now get delay() working in the shell returning to the original objective still fails. In other words, when loading in the respective view the web application hangs.
I am currently researching other approaches. Any suggestions are welcome. Given that I can now get Celery to work through the broker in the shell but not in the web application I feel like I have made some progress but am still out of a solution.
As a side note, I am also trying to make this connection through a Digital Ocean Managed Redis DB, although that is presenting a completely different issue.
Ultimately, the answer I uncovered is a compromise, a workaround using a different Digital Ocean (D.O.) product. The workaround was to use a Managed Database (which simplifies things but gives you much less control) rather than a Droplet (which involves manual Linux/Redis installation and configuration, but gives you greater control). This isn't ideal for 2 reasons. First, it costs more ($6 vs $15 base cost). Second, I would have preferred to be able to work out how to manually setup Redis (and thus maintain greater control). However, I'll take a working solution over no solution for a very niche issue.
The steps to use a D.O. Managed Redis DB are:
Provision the managed Redis DB
Use the Public Network Connection String (as the connection string includes the password I store this in an environment variable)
Ensure that you have the appropriate ssl setting in the 'celery.py' file (snippet below)
celery.py
import os
from celery import Celery
from ssl import CERT_NONE
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj_name.settings')
app = Celery(
'proj_name',
broker_use_ssl={'ssl_cert_reqs': ssl.CERT_NONE},
redis_backend_use_ssl={'ssl_cert_reqs': ssl.CERT_NONE}
)
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print(f'Request: {self.request!r}')
settings.py
REDIS_URI = os.environ.get('REDIS_URI')
CELERY_BROKER_URL = f'{REDIS_URI}/0'
CELERY_RESULT_BACKEND = f'{REDIS_URI}/1'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = TIME_ZONE

APScheduler running multiple times for the amount of gunicorn workers

I have a django project with APScheduler built in it. I have proceeded to the production environment now so binded it with gunicorn and nginx in the proceess. Gunicorn has 3 workers. Problem is that gunicorn initiates the APScheduler for each worker and runs the scheduled job 3 times instead of running it for only once.
I have seen similar questions here it seems it is a common problem. Even the APScheduler original documentation acknowledges the problem and tells no way of fixing it.
https://apscheduler.readthedocs.io/en/stable/faq.html#how-do-i-share-a-single-job-store-among-one-or-more-worker-processes
I saw in other threads people recommended putting --preconfig in the settings. But I read that --preconfig initiates the workers with the current code and does not reload when there has been a change in the code.(See "when not to preload" in below link)
https://www.joelsleppy.com/blog/gunicorn-application-preloading/
I also saw someone recommended binding a TCP socket for the APScheduler. I did not understand it fully but basically it was trying to bind a socket each time APScheduler is initiated then the second and third worker hits that binded socket and throws a socketerror. Sort of
try:
"bind socket somehow"
except socketerror:
print("socket already exists")"
else:
"run apscheduler module"
configuration. Does anyone know how to do it or know if that would actually work?
Another workaround I thought is simply removing the APScheduler and do it with cron function of the server. I am using Digital Ocean so I can simply delete the APScheduler and a cron function that will run the module instead. However, I do not want to go that way because that will make break the "unity" of the whole project and make it server dependable. Does anyone have any more ideas?
Schedule module:
from apscheduler.schedulers.background import BackgroundScheduler
from RENDER.views import dailypuzzlefunc
def start():
scheduler=BackgroundScheduler()
scheduler.add_job(dailypuzzlefunc,'cron', day="*",max_instances=2,id='dailyscheduler')
scheduler.start()
In the app:
from django.apps import AppConfig
class DailypuzzleConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "DAILYPUZZLE"
def ready(self):
from SCHEDULER import dailypuzzleschedule
dailypuzzleschedule.start()
web:
python manage.py collectstatic --no-input;
gunicorn MasjidApp.wsgi --timeout 15 --preload
use --preload.
It's working well for me.

Django run function periodically in background

I have a function that fetches data and needs to be run periodically.
All I care about is running it every 30 seconds.
I searched and found the following options -
celery
django-apscheduler
Apscheduler
I have tried Apscheduler using BackgroundScheduler and it has this problem that it'll run a new scheduler for each process.
I am completely new to scheduling functions and have no idea which one I should use or if there is a better way.
I experienced a similar issue and solved it by creating a custom management command and scheduling it on the web server.
Within the root of the app, create management/commands directory:
some_app/
__init__.py
models.py
management/
commands/
dosomething.py
tests.py
views.py
// dosomething.py
from django.core.management.base import BaseCommand, CommandError
class Command(BaseCommand):
help = 'Description of the action here'
def handle(self):
print('Doing something')
return
To confirm if it's working, run python manage.py dosomething
The scheduling itself depends on which web server you are using. In my case it was Heroku and I used their Scheduler add-on.

how manage.py can start the iteration of flask

I am making a price tracker.My project structure is this:
Myapp-folder
manage.py-from flask script module
subApp-folder
__init__.py
form.py
models.py
views.py
pricemonitor-folder
main.py
__init__.py
send_email.py
price_compare_sendemail.py-with class Compare_sendemail and start_monitor function
In the main.py, I have an interation to compare the prices every 60s and send-email if needed.
from app.PriceMonitor.price_compare_sendmail import Compare_sendemail
break_time = 60 # set waiting time for one crawl round
monitor = Compare_sendemail()
monitor.start_monitor(break_time)
The manage.py is as below:
from flask_script import Manager, Server
from app import app, db
manager = Manager(app)
manager.add_command("runserver",Server(host='127.0.0.1', port=5000, use_debugger=True))
if __name__ == '__main__':
manager.run()
But the iteration doesn't work when I run python manage.py runserver while I directly run the main.py successfully. How can I make up code to run the flask server with the compare_sendemail iteration running at the background? Thanks.
I think you are looking for Celery.
you can use Celery background task. If your application has a long running task, such as processing some uploaded data or sending email, you don’t want to wait for it to finish during a request. Instead, use a task queue to send the necessary data to another process that will run the task in the background while the request returns immediately.
here you can find documentation for celery
https://flask.palletsprojects.com/en/1.1.x/patterns/celery/
and if you want to wait for Task to complete you can use Coroutines and Tasks
https://docs.python.org/3/library/asyncio-task.html
there are other options for flask background task
like
RQ
https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xxii-background-jobs
some other Alternatives
https://smirnov-am.github.io/background-jobs-with-flask/
Threads
uWSGI thread
uWSGI spooler
uSWGI spooler is great for simple tasks. like sending OTP SMS or Email.
I answer part of my own question.
In the main.py, I used while loop and time module to iterate the price_compare_sendemail.py
every 60s. While this is not an ideal background task handler, this project is currently just for my own usage so it is OK for me. My original thought was using the flask script manager to handle all the python commands-I don't know if it is the right thought though because I just started to learn Flask.
After some google search, I found the way to use manager.
from subapp.pricemonitor.main import Start_monitor
Monitor=Start_monitor()
#manager.command
def monitor_start():
break_time=10
Monitor.start_monitoring(break_time)
Then use the command 'python manage.py monitor_start' to start the background task. I don't know if it is useful but at least it fit my original thought.

django-celery: No result backend configured

I am trying to use django-celery in my project
In settings.py I have
CELERY_RESULT_BACKEND = "amqp"
The server started fine with
python manage.py celeryd --setting=settings
But if I want to access a result from a delayed task, I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\celery\result.py", line 108, in ready
return self.status in self.backend.READY_STATES
File "C:\Python27\lib\site-packages\celery\result.py", line 196, in status
return self.state
File "C:\Python27\lib\site-packages\celery\result.py", line 191, in state
return self.backend.get_status(self.task_id)
File "C:\Python27\lib\site-packages\celery\backends\base.py", line 404, in _is
_disabled
raise NotImplementedError("No result backend configured. "
NotImplementedError: No result backend configured. Please see the documentation
for more information.
It is very strange because when I just run celeryd (with the same celery settings), it works just fine. Has anyone encountered this problem before?
Thanks in advance!
I had the same problem while getting the result back from the celery task although the celery task was executed ( console logs). What i found was, i had the same setting CELERY_RESULT_BACKEND = "redis" in django settings.py but i had also instantiated celery in the tasks.py
celery = Celery('tasks', broker='redis://localhost') - which i suppose overrides the settings.py property and hence it was not configuring the backend server for my celery instance which is used to store the results.
i removed this and let django get celery get properties from settings.py and the sample code worked for me.
If you're just running the samples from http://www.celeryproject.org/tutorials/first-steps-with-celery/, you need to run the console via manage.py:
% python manage.py shell
For those who are in a desperate search for a solution like I was.
Put this line at the end of the settings.py script:
djcelery.setup_loader()
Looks like django-celery is not going to consider it's own settings without a strict order.
In my case, the problem was that I was passing the CELERY_RESULT_BACKEND argument to the celery constructor:
Celery('proj',
broker = 'amqp://guest:guest#localhost:5672//',
CELERY_RESULT_BACKEND='amqp://',
include=['proj.tasks'])
The solution was to use the backend argument instead:
Celery('proj',
broker = 'amqp://guest:guest#localhost:5672//',
backend='amqp://',
include=['proj.tasks'])
Some how the console has to have django environment set up in order to pick up the settings. For example, in PyCharm you can run django console, in which everything works as expected.
See AMQP BACKEND SETTINGS for better understanding
Note The AMQP backend requires RabbitMQ 1.1.0 or higher to
automatically expire results. If you are running an older version of
RabbitMQ you should disable result expiration like this:
CELERY_TASK_RESULT_EXPIRES = None
Try adding the below line to your settings.py:
CELERY_TASK_RESULT_EXPIRES = 18000 # 5 hours