Exception in celery task - django

tasks.py
#shared_task(bind=True, default_retry_delay=60, max_retries=3)
def index_city(self, pk):
from .models import City
try:
city = City.objects.get(pk=pk)
except City.ObjectDoesNotExist:
self.retry()
#Do stuff here with City
When I call the above task without .delay, it works without issue. When I call the task with .delay on my dev environment with celery running, it also works fine. However, in production, the following exception is thrown:
type object 'City' has no attribute 'ObjectDoesNotExist'
I added time.sleep(10) to rule out any race conditions, but this had no effect and the exception was still thrown. The object does in fact exist, so it seems like the inline import of City is not happening (inline import is done to prevent circular import issues) Any ideas how to fix this would be appreciated please.
Stack
Django 1.8.5
Python 2.7.10
sqlite on dev and postgresql on production

You should use City.DoesNotExist or django.core.exceptions.ObjectDoesNotExist instead City.ObjectDoesNotExist
See https://docs.djangoproject.com/en/1.9/ref/exceptions/#objectdoesnotexist

Related

DJANGO_CELERY_BEAT access PeriodicTask from shared_task

In my project, I use django celery beat package to execute scheduled tasks. It works well but I have one case that I can't handle.
All the tasks have a PeriodicTack that schedules them.
So the following task:
from celery import shared_task
#shared_task
def foo(**kwargs):
# Here I can to things like this :
whatever_method(kwargs["bar"])
Don't know if it is luck but it turns out that kwargs "points" to the kwargs attribute of the PeriodicTask model.
My question is :
How can I access the PeriodicTask instance that made the task run ?
What if I have 2 PeriodicTask that use the same shared_task but with different schedules/parameters, will it find out which one was the source for that particular run ?
Thanks in advance for your help.
Ok I found a way to do this.
As I said in the comment, making use of #app.task solves my needs.
I end up with a task like this :
#app.task(bind=True)
def foo(self, **kwargs):
# The information I personally need is in self properties, like so :
desired_info = self.request.properties
# Do whatever is needed with desired info...
# Do whatever else ...
Where app is the Celery app as described in the docs.
The bind=True is, as I understood, necessary to make the task having its own request and thus having access to self with information.

Django Celery: Task never executes

In my django application I am using celery. In a post_save signal, I am updating the index in elastic search. But for some reason the task gets hung and never actually executes the code:
What I use to run celery:
celery -A collegeapp worker -l info
The Signal:
#receiver(post_save, sender=University)
def university_saved(sender, instance, created, **kwargs):
"""
University save signal
"""
print('calling celery task')
update_university_index.delay(instance.id)
print('finished')
The task:
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(instance_id):
print('updating university index')
The only output I get is calling celery task. after waiting over 30 minutes, it doesn't ever get to any other print statements and the view continue to wait. Nothing ever shows in celery terminal.
Versions:
Django 3.0,
Celery 4.3,
Redis 5.0.9,
Ubuntu 18
UPDATE:
after doing some testing, using the debug_task defined inside the celery.py file in place of update_university_index does not lead to hanging. It behaves as expect. I thought maybe it could have been app.task vs task decorator but it seems that's not it.
#app.task(bind=True)
def debug_task(text, second_value):
print('printing debug_task {} {}'.format(text, second_value))
This happened with me once, I had made the dumbest error, django tells us to specify celery tasks in tasks.py file, and uses that for task discovery. After that it worked. Could you provide more insight into the directory structure using tree command?
This tutorial is for flask, but the same can be achieved in django. Where this particular tutorial shines is that after you tell celery to execute a task, it also provides you with a uuid and you can ping that url and monitor the progress of the task you triggered.
Verify that the tasks have been registered by celery using (Do make sure that celery is running):
from celery.task.control import inspect
i = inspect()
i.registered_tasks()
Or bash
$ celery inspect registered
$ celery -A collegeapp inspect registered
From https://docs.celeryproject.org/en/latest/faq.html#the-worker-isn-t-doing-anything-just-hanging
Why is Task.delay/apply*/the worker just hanging?
Answer: There’s a bug in some AMQP clients that’ll make it hang if it’s not able to authenticate the current user, the password doesn’t match or the user doesn’t have access to the virtual host specified. Be sure to check your broker logs (for RabbitMQ that’s /var/log/rabbitmq/rabbit.log on most systems), it usually contains a message describing the reason.
Change this line
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(instance_id):
print('updating university index')
To
#task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(self, instance_id):
print('updating university index')
Or add self to the task definition.
I'm still not sure as to why it doesn't work but I found a solution by replace task with app.task
importing app from my celery.py seemed to have resolved the issue.
from collegeapp.celery import app
#app.task(name="update_university_index", bind=True, default_retry_delay=5, max_retries=1, acks_late=True)
def update_university_index(self, instance_id):
print('updating university index')

Django testing - StaticLiveServerTestCase - client.cookies empty if there was another test case before

I'm testing my Django application with Selenium in Docker. I encounter a peculiar thing related to cookies availability (I use cookies to authenticate in my tests).
Here is the code that works:
from django.contrib.staticfiles.testing import StaticLiveServerTestCase
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from users.models import CustomUser
class SomeTest(StaticLiveServerTestCase):
#classmethod
def setUpClass(cls):
cls.host = "web" # Docker service name
super().setUpClass()
CustomUser.objects.create_user(username="user", password="password")
def setUp(self):
self.browser = webdriver.Remote("http://selenium:4444/wd/hub", DesiredCapabilities.FIREFOX)
def tearDown(self):
self.browser.quit()
def test2(self):
self.client.login(username="user", password="password")
cookie = self.client.cookies["sessionid"]
...
However, when I insert there another test case before test2, let it be something as simple as
def test1(self):
pass
then the code crashes with the following error:
Traceback (most recent call last):
File "/home/mysite/functional_tests/test.py", line 28, in test2
cookie = self.client.cookies["sessionid"]
KeyError: 'sessionid'
So the only difference between the working and not-working code is a dummy test function, but what does it change? As far as I know the setUp and tearDown methods make sure that the "environment" is the same for every test case, no matter what happens in other test methods and here it clearly depends on the (non-)existence of other test cases before running my test... Is there something I misunderstand? Or is it some kind of a bug?
Any help will be appreciated.
My setup:
Django==2.2.5
selenium==3.141.0
Docker version - 19.03.5
I've solved it and I'm posting the answer here in case anyone else encounters similar issues.
So the problem here was not with test case order, Docker, Selenium, or anything within the code itself but with my lack of understanding of how class StaticLiveServerTestCase behaves. Namely, this class inherits from LiveServerTestCase which in turn inherits from TransactionTestCase which tears down the database after each test case (and sets it up before another test case) - more on this can be found in Django docs. And as I was creating the user in setUpClass - which is run once per all the test cases in the class - it was indeed created but removed (together with the whole database) after any first test case. Sowhen I was doing self.client.login(username="user", password="password") it was not a problem with cookies or authentication per se but with the fact that the user simply didn't exist.
:-)

Get update query when DEBUG is False, without affecting code execution

I would like to view the queries run inside a block of code, ideally getting it as a list of strings.
Of course there are similar SO questions and answers, but they do not address my three specific requirements:
Works for queries other than SELECT.
Works when the code is not in DEBUG mode.
The code executes normally, i.e. any production code runs as production code.
What I have so far is a transaction inside a DEBUG=True override, that is instantly rolled back after the queries are collected.
from contextlib import contextmanager
from django.conf import settings
from django.db import connections
from django.db import transaction
from django.test.utils import override_settings
#contextmanager
#override_settings(DEBUG=True)
def print_query():
class OhNo(Exception):
pass
queries = []
try:
with transaction.atomic():
yield
for connection in connections:
queries.extend(connections[connection].queries)
print(queries)
raise OhNo
except OhNo:
pass
def do_special_debug_thing():
print('haha yes')
with print_query():
Foo.objects.update(bar=1)
if settings.DEBUG:
do_special_debug_thing()
There are two problems with that snippet:
That DEBUG override doesn't do anything. The context manager prints out [].
If the DEBUG override is effective, then do_special_debug_thing is called, which I do not want to happen.
So, as far as I know, there is no way to collect all queries made inside a block of code, including those that are SELECT statements, while DEBUG is off. What ways are there to achieve this?
If you would like to do that only once, getting queries separately and put them in a list could help you.
update = Foo.objects.filter(bar=1)
query = str(update.query)
print(query)

Celery+Docker+Django -- Getting tasks to work

I've been trying to learn Celery over the past week and adding it to my project that uses Django and Docker-Compose. I am having a hard time understanding how to get it to work; my issue is that I can't seem to get uploading to my database to work when using tasks. The upload function, insertIntoDatabase, was working fine before without any involvement with Celery but now uploading doesn't work. Indeed, when I try to upload, my website tells me too quickly that the upload was successful, but then nothing actually gets uploaded.
The server is started up with docker-compose up, which will make migrations, perform a migrate, collect static files, update requirements, and then start the server. This is all done using pavement.py; the command in the Dockerfile is CMD paver docker_run. At no point is a Celery worker explicitly started; should I be doing that? If so, how?
This is the way I'm calling the upload function in views.py:
insertIntoDatabase.delay(datapoints, user, description)
The upload function is defined in a file named databaseinserter.py. The following decorator was used for insertIntoDatabase:
#shared_task(bind=True, name="database_insert", base=DBTask)
Here is the definition of the DBTask class in celery.py:
class DBTask(Task):
abstract = True
def on_failure(self, exc, *args, **kwargs):
raise exc
I am not really sure what to write for tasks.py. Here is what I was left with by a former co-worker just before I picked up from where he left off:
from celery.decorators import task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task(name="database_insert")
def database_insert(data):
And here are the settings I used to configure Celery (settings.py):
BROKER_TRANSPORT = 'redis'
_REDIS_LOCATION = 'redis://{}:{}'.format(os.environ.get("REDIS_PORT_6379_TCP_ADDR"), os.environ.get("REDIS_PORT_6379_TCP_PORT"))
BROKER_URL = _REDIS_LOCATION + '/0'
CELERY_RESULT_BACKEND = _REDIS_LOCATION + '/1'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"
Now, I'm guessing that database_insert in tasks.py shouldn't be empty, but what should go there instead? Also, it doesn't seem like anything in tasks.py happens anyway--when I added some logging statements to see if tasks.py was at least being run, nothing actually ended up getting logged, making me think that tasks.py isn't even being run. How do I properly make my upload function into a task?
You're not too far off from getting this working, I think.
First, I'd recommend that you do try to keep your Celery tasks and your business logic separate. So, for example, it probably makes good sense to have the business logic involved with inserting your data into your DB in the insertIntoDatabase function, and then separately create a Celery task, perhaps name insert_into_db_task, that takes in your args as plain python objects (important) and calls the aforementioned insertIntoDatabase function with those args to actually complete the DB insertion.
Code for that example might looks like this:
my_app/tasks/insert_into_db.py
from celery.decorators import task
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task()
def insert_into_db_task(datapoints, user, description):
from my_app.services import insertIntoDatabase
insertIntoDatabase(datapoints, user, description)
my_app/services/insertIntoDatabase.py
def insertIntoDatabase(datapoints, user, description):
"""Note that this function is not a task, by design"""
# do db insertion stuff
my_app/views/insert_view.py
from my_app.tasks import insert_into_db_task
def simple_insert_view_func(request, args, kwargs):
# start handling request, define datapoints, user, description
# next line creates the **task** which will later do the db insertion
insert_into_db_task.delay(datapoints, user, description)
return Response(201)
The app structure I'm implying is just how I would do it and isn't required. Note also that you can probably use #task() straight up and not define any args for it. Might simplify things for you.
Does that help? I like to keep my tasks light and fluffy. They mostly just do jerk proofing (make sure the involved objs exist in DB, for instance), tweak what happens if the task fails (retry later? abort task? etc.), logging, and otherwise they execute business logic that lives elsewhere.
Also, in case it's not obvious, you do need to be running celery somewhere so that there are workers to actually process the tasks that your view code are creating. If you don't run celery somewhere then your tasks will just stack up in the queue and never get processed (and so your DB insertions will never happen).