In my django application I am using celery. In a post_save signal, I am updating the index in elastic search. But for some reason the task gets hung and never actually executes the code:
What I use to run celery:
celery -A collegeapp worker -l info
The Signal:
#receiver(post_save, sender=University)
def university_saved(sender, instance, created, **kwargs):
"""
University save signal
"""
print('calling celery task')
update_university_index.delay(instance.id)
print('finished')
The task:
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(instance_id):
print('updating university index')
The only output I get is calling celery task. after waiting over 30 minutes, it doesn't ever get to any other print statements and the view continue to wait. Nothing ever shows in celery terminal.
Versions:
Django 3.0,
Celery 4.3,
Redis 5.0.9,
Ubuntu 18
UPDATE:
after doing some testing, using the debug_task defined inside the celery.py file in place of update_university_index does not lead to hanging. It behaves as expect. I thought maybe it could have been app.task vs task decorator but it seems that's not it.
#app.task(bind=True)
def debug_task(text, second_value):
print('printing debug_task {} {}'.format(text, second_value))
This happened with me once, I had made the dumbest error, django tells us to specify celery tasks in tasks.py file, and uses that for task discovery. After that it worked. Could you provide more insight into the directory structure using tree command?
This tutorial is for flask, but the same can be achieved in django. Where this particular tutorial shines is that after you tell celery to execute a task, it also provides you with a uuid and you can ping that url and monitor the progress of the task you triggered.
Verify that the tasks have been registered by celery using (Do make sure that celery is running):
from celery.task.control import inspect
i = inspect()
i.registered_tasks()
Or bash
$ celery inspect registered
$ celery -A collegeapp inspect registered
From https://docs.celeryproject.org/en/latest/faq.html#the-worker-isn-t-doing-anything-just-hanging
Why is Task.delay/apply*/the worker just hanging?
Answer: There’s a bug in some AMQP clients that’ll make it hang if it’s not able to authenticate the current user, the password doesn’t match or the user doesn’t have access to the virtual host specified. Be sure to check your broker logs (for RabbitMQ that’s /var/log/rabbitmq/rabbit.log on most systems), it usually contains a message describing the reason.
Change this line
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(instance_id):
print('updating university index')
To
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(self, instance_id):
print('updating university index')
Or add self to the task definition.
I'm still not sure as to why it doesn't work but I found a solution by replace task with app.task
importing app from my celery.py seemed to have resolved the issue.
from collegeapp.celery import app
#app.task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(self, instance_id):
print('updating university index')
Related
In my project, I use django celery beat package to execute scheduled tasks. It works well but I have one case that I can't handle.
All the tasks have a PeriodicTack that schedules them.
So the following task:
from celery import shared_task
#shared_task
def foo(**kwargs):
# Here I can to things like this :
whatever_method(kwargs["bar"])
Don't know if it is luck but it turns out that kwargs "points" to the kwargs attribute of the PeriodicTask model.
My question is :
How can I access the PeriodicTask instance that made the task run ?
What if I have 2 PeriodicTask that use the same shared_task but with different schedules/parameters, will it find out which one was the source for that particular run ?
Thanks in advance for your help.
Ok I found a way to do this.
As I said in the comment, making use of #app.task solves my needs.
I end up with a task like this :
#app.task(bind=True)
def foo(self, **kwargs):
# The information I personally need is in self properties, like so :
desired_info = self.request.properties
# Do whatever is needed with desired info...
# Do whatever else ...
Where app is the Celery app as described in the docs.
The bind=True is, as I understood, necessary to make the task having its own request and thus having access to self with information.
I use multiple post_save functions to trigger different celery (4.4.0, 4.8.3) tasks and tried Django 2 and 3. For some strange reason celery stopped executing all tasks in parallel instead only one task gets received each time model is saved. The other tasks are not even received.
To run all the tasks, I have to save the model multiple times. It was working before and I have no idea why the behavior changed all of a sudden.
I am starting the queue with:
celery -A appname worker -l info -E
My post save functions:
#receiver(models.signals.post_save, sender=RawFile)
def execute_rawtools_qc(sender, instance, created, *args, **kwargs):
rawtools_qc.delay(instance.path, instance.path)
#receiver(models.signals.post_save, sender=RawFile)
def execute_rawtools_metrics(sender, instance, created, *args, **kwargs):
rawtools_metrics.delay(instance.abs_path, instance.path)
And my tasks:
#shared_task
def rawtools_metrics(raw, output_dir):
cmd = rawtools_metrics_cmd(raw=raw, output_dir=output_dir)
os.system(cmd)
#shared_task
def rawtools_qc(input_dir, output_dir):
cmd = rawtools_qc_cmd(input_dir=input_dir, output_dir=output_dir)
os.system(cmd)
Before those tasks where executed in parallel as soon as the model was saved. Now, the first task gets executed when the model instance is saved, and the second instance is executed the second time the model is saved. And then the functions alternate each time. Any idea what may cause this strange behavior?
UPDATE: I think both task are executed randomly, but only one for each save.
Also, there are no other celery workers running.
If you are running both functions for the same model run them in the same post_save method:
#receiver(models.signals.post_save, sender=RawFile)
def execute_rawtools_qc(sender, instance, created, *args, **kwargs):
rawtools_qc.delay(instance.path, instance.path)
rawtools_metrics.delay(instance.abs_path, instance.path)
I need to import data from several public APIs for a user after he signed up. django-allauth is included and I have registered a signal handler to call the right methods after allaut emits user_signed_up.
Because the data import needs to much time and the request is blocked by the signal, I want to use celery to do the work.
My test task:
#app.task()
def test_task(username):
print('##########################Foo#################')
sleep(40)
print('##########################' + username + '#################')
sleep(20)
print('##########################Bar#################')
return 3
I'm calling the task like this:
from game_studies_platform.taskapp.celery import test_task
#receiver(user_signed_up)
def on_user_signed_in(sender, request, *args, **kwargs):
test_task.apply_async('John Doe')
The task should be put into the queue and the request should be followed immediately. But it is blocked and I have to wait a minute.
The project is setup with https://github.com/pydanny/cookiecutter-django and I'm running it in a docker container.
Celery is configured to use the django database in development but will be redis in production
The solution was to switch CELERY_ALWAYS_EAGER = True to False in the local.py. I was pointed to that solution in the Gitter channel of cookiecutter-django.
The calls mention above where already correct.
I've been trying to learn Celery over the past week and adding it to my project that uses Django and Docker-Compose. I am having a hard time understanding how to get it to work; my issue is that I can't seem to get uploading to my database to work when using tasks. The upload function, insertIntoDatabase, was working fine before without any involvement with Celery but now uploading doesn't work. Indeed, when I try to upload, my website tells me too quickly that the upload was successful, but then nothing actually gets uploaded.
The server is started up with docker-compose up, which will make migrations, perform a migrate, collect static files, update requirements, and then start the server. This is all done using pavement.py; the command in the Dockerfile is CMD paver docker_run. At no point is a Celery worker explicitly started; should I be doing that? If so, how?
This is the way I'm calling the upload function in views.py:
insertIntoDatabase.delay(datapoints, user, description)
The upload function is defined in a file named databaseinserter.py. The following decorator was used for insertIntoDatabase:
#shared_task(bind=True, name="database_insert", base=DBTask)
Here is the definition of the DBTask class in celery.py:
class DBTask(Task):
abstract = True
def on_failure(self, exc, *args, **kwargs):
raise exc
I am not really sure what to write for tasks.py. Here is what I was left with by a former co-worker just before I picked up from where he left off:
from celery.decorators import task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task(name="database_insert")
def database_insert(data):
And here are the settings I used to configure Celery (settings.py):
BROKER_TRANSPORT = 'redis'
_REDIS_LOCATION = 'redis://{}:{}'.format(os.environ.get("REDIS_PORT_6379_TCP_ADDR"), os.environ.get("REDIS_PORT_6379_TCP_PORT"))
BROKER_URL = _REDIS_LOCATION + '/0'
CELERY_RESULT_BACKEND = _REDIS_LOCATION + '/1'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"
Now, I'm guessing that database_insert in tasks.py shouldn't be empty, but what should go there instead? Also, it doesn't seem like anything in tasks.py happens anyway--when I added some logging statements to see if tasks.py was at least being run, nothing actually ended up getting logged, making me think that tasks.py isn't even being run. How do I properly make my upload function into a task?
You're not too far off from getting this working, I think.
First, I'd recommend that you do try to keep your Celery tasks and your business logic separate. So, for example, it probably makes good sense to have the business logic involved with inserting your data into your DB in the insertIntoDatabase function, and then separately create a Celery task, perhaps name insert_into_db_task, that takes in your args as plain python objects (important) and calls the aforementioned insertIntoDatabase function with those args to actually complete the DB insertion.
Code for that example might looks like this:
my_app/tasks/insert_into_db.py
from celery.decorators import task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task()
def insert_into_db_task(datapoints, user, description):
from my_app.services import insertIntoDatabase
insertIntoDatabase(datapoints, user, description)
my_app/services/insertIntoDatabase.py
def insertIntoDatabase(datapoints, user, description):
"""Note that this function is not a task, by design"""
# do db insertion stuff
my_app/views/insert_view.py
from my_app.tasks import insert_into_db_task
def simple_insert_view_func(request, args, kwargs):
# start handling request, define datapoints, user, description
# next line creates the **task** which will later do the db insertion
insert_into_db_task.delay(datapoints, user, description)
return Response(201)
The app structure I'm implying is just how I would do it and isn't required. Note also that you can probably use #task() straight up and not define any args for it. Might simplify things for you.
Does that help? I like to keep my tasks light and fluffy. They mostly just do jerk proofing (make sure the involved objs exist in DB, for instance), tweak what happens if the task fails (retry later? abort task? etc.), logging, and otherwise they execute business logic that lives elsewhere.
Also, in case it's not obvious, you do need to be running celery somewhere so that there are workers to actually process the tasks that your view code are creating. If you don't run celery somewhere then your tasks will just stack up in the queue and never get processed (and so your DB insertions will never happen).
I use:
Celery
Django-Celery
RabbitMQ
I can see all my tasks in the Django admin page, but at the moment it has just a few states, like:
RECEIVED
RETRY
REVOKED
SUCCESS
STARTED
FAILURE
PENDING
It's not enough information for me. Is it possible to add more details about a running process to the admin page? Like progress bar or finished jobs counter etc.
I know how to use the Celery logging function, but a GUI is better in my case for some reasons.
So, is it possible to send some tracing information to the Django-Celery admin page?
Here's my minimal progress-reporting Django backend using your setup. I'm still a Django n00b and it's the first time I'm messing with Celery, so this can probably be optimized.
from time import sleep
from celery import task, current_task
from celery.result import AsyncResult
from django.http import HttpResponse, HttpResponseRedirect
from django.core.urlresolvers import reverse
from django.utils import simplejson as json
from django.conf.urls import patterns, url
#task()
def do_work():
""" Get some rest, asynchronously, and update the state all the time """
for i in range(100):
sleep(0.1)
current_task.update_state(state='PROGRESS',
meta={'current': i, 'total': 100})
def poll_state(request):
""" A view to report the progress to the user """
if 'job' in request.GET:
job_id = request.GET['job']
else:
return HttpResponse('No job id given.')
job = AsyncResult(job_id)
data = job.result or job.state
return HttpResponse(json.dumps(data), mimetype='application/json')
def init_work(request):
""" A view to start a background job and redirect to the status page """
job = do_work.delay()
return HttpResponseRedirect(reverse('poll_state') + '?job=' + job.id)
urlpatterns = patterns('webapp.modules.asynctasks.progress_bar_demo',
url(r'^init_work$', init_work),
url(r'^poll_state$', poll_state, name="poll_state"),
)
I am starting to try figuring this out myself. Start by defining a PROGRESS state exactly as explained on the Celery userguide, then all you need is to insert a js in your template that will update your progress bar.
Thank #Florian Sesser for your example!
I made a complete Django app that show the progress of create 1000 objects to the users at http://iambusychangingtheworld.blogspot.com/2013/07/django-celery-display-progress-bar-of.html
Everyone can download and use it!
I would recommend a library called celery-progress for this. It is designed to make it as easy as possible to drop-in a basic end-to-end progress bar setup into a django app with as little scaffolding as possible, while also supporting heavy customization on the front-end if desired. Lots of docs and references for getting started in the README.
Full disclosure: I am the author/maintainer of said library.