I've been trying to learn Celery over the past week and adding it to my project that uses Django and Docker-Compose. I am having a hard time understanding how to get it to work; my issue is that I can't seem to get uploading to my database to work when using tasks. The upload function, insertIntoDatabase, was working fine before without any involvement with Celery but now uploading doesn't work. Indeed, when I try to upload, my website tells me too quickly that the upload was successful, but then nothing actually gets uploaded.
The server is started up with docker-compose up, which will make migrations, perform a migrate, collect static files, update requirements, and then start the server. This is all done using pavement.py; the command in the Dockerfile is CMD paver docker_run. At no point is a Celery worker explicitly started; should I be doing that? If so, how?
This is the way I'm calling the upload function in views.py:
insertIntoDatabase.delay(datapoints, user, description)
The upload function is defined in a file named databaseinserter.py. The following decorator was used for insertIntoDatabase:
#shared_task(bind=True, name="database_insert", base=DBTask)
Here is the definition of the DBTask class in celery.py:
class DBTask(Task):
abstract = True
def on_failure(self, exc, *args, **kwargs):
raise exc
I am not really sure what to write for tasks.py. Here is what I was left with by a former co-worker just before I picked up from where he left off:
from celery.decorators import task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task(name="database_insert")
def database_insert(data):
And here are the settings I used to configure Celery (settings.py):
BROKER_TRANSPORT = 'redis'
_REDIS_LOCATION = 'redis://{}:{}'.format(os.environ.get("REDIS_PORT_6379_TCP_ADDR"), os.environ.get("REDIS_PORT_6379_TCP_PORT"))
BROKER_URL = _REDIS_LOCATION + '/0'
CELERY_RESULT_BACKEND = _REDIS_LOCATION + '/1'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"
Now, I'm guessing that database_insert in tasks.py shouldn't be empty, but what should go there instead? Also, it doesn't seem like anything in tasks.py happens anyway--when I added some logging statements to see if tasks.py was at least being run, nothing actually ended up getting logged, making me think that tasks.py isn't even being run. How do I properly make my upload function into a task?
You're not too far off from getting this working, I think.
First, I'd recommend that you do try to keep your Celery tasks and your business logic separate. So, for example, it probably makes good sense to have the business logic involved with inserting your data into your DB in the insertIntoDatabase function, and then separately create a Celery task, perhaps name insert_into_db_task, that takes in your args as plain python objects (important) and calls the aforementioned insertIntoDatabase function with those args to actually complete the DB insertion.
Code for that example might looks like this:
my_app/tasks/insert_into_db.py
from celery.decorators import task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task()
def insert_into_db_task(datapoints, user, description):
from my_app.services import insertIntoDatabase
insertIntoDatabase(datapoints, user, description)
my_app/services/insertIntoDatabase.py
def insertIntoDatabase(datapoints, user, description):
"""Note that this function is not a task, by design"""
# do db insertion stuff
my_app/views/insert_view.py
from my_app.tasks import insert_into_db_task
def simple_insert_view_func(request, args, kwargs):
# start handling request, define datapoints, user, description
# next line creates the **task** which will later do the db insertion
insert_into_db_task.delay(datapoints, user, description)
return Response(201)
The app structure I'm implying is just how I would do it and isn't required. Note also that you can probably use #task() straight up and not define any args for it. Might simplify things for you.
Does that help? I like to keep my tasks light and fluffy. They mostly just do jerk proofing (make sure the involved objs exist in DB, for instance), tweak what happens if the task fails (retry later? abort task? etc.), logging, and otherwise they execute business logic that lives elsewhere.
Also, in case it's not obvious, you do need to be running celery somewhere so that there are workers to actually process the tasks that your view code are creating. If you don't run celery somewhere then your tasks will just stack up in the queue and never get processed (and so your DB insertions will never happen).
Related
In my project, I use django celery beat package to execute scheduled tasks. It works well but I have one case that I can't handle.
All the tasks have a PeriodicTack that schedules them.
So the following task:
from celery import shared_task
#shared_task
def foo(**kwargs):
# Here I can to things like this :
whatever_method(kwargs["bar"])
Don't know if it is luck but it turns out that kwargs "points" to the kwargs attribute of the PeriodicTask model.
My question is :
How can I access the PeriodicTask instance that made the task run ?
What if I have 2 PeriodicTask that use the same shared_task but with different schedules/parameters, will it find out which one was the source for that particular run ?
Thanks in advance for your help.
Ok I found a way to do this.
As I said in the comment, making use of #app.task solves my needs.
I end up with a task like this :
#app.task(bind=True)
def foo(self, **kwargs):
# The information I personally need is in self properties, like so :
desired_info = self.request.properties
# Do whatever is needed with desired info...
# Do whatever else ...
Where app is the Celery app as described in the docs.
The bind=True is, as I understood, necessary to make the task having its own request and thus having access to self with information.
In my django application I am using celery. In a post_save signal, I am updating the index in elastic search. But for some reason the task gets hung and never actually executes the code:
What I use to run celery:
celery -A collegeapp worker -l info
The Signal:
#receiver(post_save, sender=University)
def university_saved(sender, instance, created, **kwargs):
"""
University save signal
"""
print('calling celery task')
update_university_index.delay(instance.id)
print('finished')
The task:
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(instance_id):
print('updating university index')
The only output I get is calling celery task. after waiting over 30 minutes, it doesn't ever get to any other print statements and the view continue to wait. Nothing ever shows in celery terminal.
Versions:
Django 3.0,
Celery 4.3,
Redis 5.0.9,
Ubuntu 18
UPDATE:
after doing some testing, using the debug_task defined inside the celery.py file in place of update_university_index does not lead to hanging. It behaves as expect. I thought maybe it could have been app.task vs task decorator but it seems that's not it.
#app.task(bind=True)
def debug_task(text, second_value):
print('printing debug_task {} {}'.format(text, second_value))
This happened with me once, I had made the dumbest error, django tells us to specify celery tasks in tasks.py file, and uses that for task discovery. After that it worked. Could you provide more insight into the directory structure using tree command?
This tutorial is for flask, but the same can be achieved in django. Where this particular tutorial shines is that after you tell celery to execute a task, it also provides you with a uuid and you can ping that url and monitor the progress of the task you triggered.
Verify that the tasks have been registered by celery using (Do make sure that celery is running):
from celery.task.control import inspect
i = inspect()
i.registered_tasks()
Or bash
$ celery inspect registered
$ celery -A collegeapp inspect registered
From https://docs.celeryproject.org/en/latest/faq.html#the-worker-isn-t-doing-anything-just-hanging
Why is Task.delay/apply*/the worker just hanging?
Answer: There’s a bug in some AMQP clients that’ll make it hang if it’s not able to authenticate the current user, the password doesn’t match or the user doesn’t have access to the virtual host specified. Be sure to check your broker logs (for RabbitMQ that’s /var/log/rabbitmq/rabbit.log on most systems), it usually contains a message describing the reason.
Change this line
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(instance_id):
print('updating university index')
To
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(self, instance_id):
print('updating university index')
Or add self to the task definition.
I'm still not sure as to why it doesn't work but I found a solution by replace task with app.task
importing app from my celery.py seemed to have resolved the issue.
from collegeapp.celery import app
#app.task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(self, instance_id):
print('updating university index')
In my project I use transactions by default. I want to disable them for few celery tasks. But when I use:
https://docs.djangoproject.com/en/2.0/topics/db/transactions/#django.db.transaction.non_atomic_requests
from django.db import transaction
#transaction.non_atomic_requests
#app.task(bind=True, name='my_task')
def tasks_monitor(task):
m = MyModel.objects.get(id=1)
m.value = 5
m.save()
time.sleep(40)
My celery task is still do transactions. And looks like #transaction.non_atomic_requests and #transaction.atomic_requests not applying.
UPD: Trying to swap order, not working too.
When i disabled DATABASES['default']['ATOMIC_REQUESTS'] = False it works as expected
The transaction.non_atomic_requests is meant to decorate a view, it won't have any effect on a Celery task. But here is the thing: the setting ATOMIC_REQUESTS shouldn't have any effect either! The only place it is evaluated in Django is in core.handlers.base.make_view_atomic, which wraps views.
Therefore, my assumption is that you run your application with CELERY_TASK_ALWAYS_EAGER=True (CELERY_ALWAYS_EAGER in older versions of Celery). In this case, the transaction of the view is active. If you set it to False and run your tasks in a worker, each DB operation should be autocomitted.
I need to import data from several public APIs for a user after he signed up. django-allauth is included and I have registered a signal handler to call the right methods after allaut emits user_signed_up.
Because the data import needs to much time and the request is blocked by the signal, I want to use celery to do the work.
My test task:
#app.task()
def test_task(username):
print('##########################Foo#################')
sleep(40)
print('##########################' + username + '#################')
sleep(20)
print('##########################Bar#################')
return 3
I'm calling the task like this:
from game_studies_platform.taskapp.celery import test_task
#receiver(user_signed_up)
def on_user_signed_in(sender, request, *args, **kwargs):
test_task.apply_async('John Doe')
The task should be put into the queue and the request should be followed immediately. But it is blocked and I have to wait a minute.
The project is setup with https://github.com/pydanny/cookiecutter-django and I'm running it in a docker container.
Celery is configured to use the django database in development but will be redis in production
The solution was to switch CELERY_ALWAYS_EAGER = True to False in the local.py. I was pointed to that solution in the Gitter channel of cookiecutter-django.
The calls mention above where already correct.
tasks.py
#shared_task(bind=True, default_retry_delay=60, max_retries=3)
def index_city(self, pk):
from .models import City
try:
city = City.objects.get(pk=pk)
except City.ObjectDoesNotExist:
self.retry()
#Do stuff here with City
When I call the above task without .delay, it works without issue. When I call the task with .delay on my dev environment with celery running, it also works fine. However, in production, the following exception is thrown:
type object 'City' has no attribute 'ObjectDoesNotExist'
I added time.sleep(10) to rule out any race conditions, but this had no effect and the exception was still thrown. The object does in fact exist, so it seems like the inline import of City is not happening (inline import is done to prevent circular import issues) Any ideas how to fix this would be appreciated please.
Stack
Django 1.8.5
Python 2.7.10
sqlite on dev and postgresql on production
You should use City.DoesNotExist or django.core.exceptions.ObjectDoesNotExist instead City.ObjectDoesNotExist
See https://docs.djangoproject.com/en/1.9/ref/exceptions/#objectdoesnotexist