django-nose unit testing a celery task ... missing database data - django

I'm writing unit tests for a celery task using django-nose. It's fairly typical; a blank test database (REUSE_DB=0) that is pre-populated via a fixture at test time.
The problem I have is that even though the TestCase is loading the fixture and I can access the objects from the test method, the same query fails when executed within an async celery task.
I've checked that the settings.DATABASES["default"]["name"] are the same both in the test method and the task under test. I've also validated the that the task that's under test behaves correctly when invoked as a regular method call.
And that's about where I'm out of ideas.
Here's a sample:
class MyTest(TestCase):
fixtures = ['test_data.json']
def setUp(self):
settings.CELERY_ALWAYS_EAGER = True # seems to be required; if not I get socket errors for Rabbit
settings.CELERY_EAGER_PROPAGATES_EXCEPTIONS = True # exposes errors in the code under test.
def test_city(self):
self.assertIsNotNone(City.objects.get(name='brisbane'))
myTask.delay(city_name='brisbane').get()
# The following works fine: myTask('brisbane')
from celery.task import task
#task()
def myTask(city_name):
c = City.objects.count() # gives 0
my_city = City.objects.get(name=city_name) # raises DoesNotExist exception
return

This sounds a lot like a bug in django-celery 2.5 which was fixed in 2.5.2: https://github.com/celery/django-celery/pull/116
The brief description of the bug is that the django-celery loader was closing the DB connection prior to executing the task even eager tasks. Since the tests run inside a transaction the new connection for the task execution can't see the data created in the setUp.

Related

Mock async_task of Django-q

I'm using django-q and I'm currently working on adding tests using mock for my existing tasks. I could easily create tests for each task without depending on django-q but one of my task is calling another async_task. Here's an example:
import requests
from django_q.tasks import async_task
task_a():
response = requests.get(url)
# process response here
if condition:
async_task('task_b')
task_b():
response = requests.get(another_url)
And here's how I test them:
import requests
from .tasks import task_a
from .mock_responses import task_a_response
#mock.patch.object(requests, "get")
#mock.patch("django_q.tasks.async_task")
def test_async_task(self, mock_async_task, mock_task_a):
mock_task_a.return_value.status_code = 200
mock_task_a.return_value.json.return_value = task_a_response
mock_async_task.return_value = "12345"
# execute the task
task_a()
self.assertTrue(mock_task_a.called)
self.assertTrue(mock_async_task.called)
I know for a fact that async_task returns the task ID, hence the line, mock_async_task.return_value = "12345". However, after running the test, mock_async_task returns False and the task is being added into the queue (I could see a bunch of 01:42:59 [Q] INFO Enqueued 1 from the server) which is what I'm trying to avoid. Is there any way to accomplish this?
In order to prevent the task from being added to the queue, you need to change the configuration sync to True when the tests are running. You can find more info about the configurations here

Does celery task id change after redistribution

I have a Django model which has a column called celery_task_id. I am using RabbitMQ as the broker. There's a celery function called test_celery which takes a model object as parameter. Now I have the following lines of code which creates a celery task.
def create_celery_task():
celery_task_id = test_celery.apply_async((model_obj,), eta='Future Datetime Object')
model_obj.celery_task_id = celery_task_id
model_obj.save()
----
----
Now inside the celery function I am verifying if the task id is same as of the one stored in the DB or not.
#app.task
def test_celery(model_obj):
if model_obj.celery_task_id == test_celery.request.id:
## Do something
My problem is there are a lot of cases where I can see the task being received and succeeding in the log but not executing the code inside of if condition.
Is it possible that celery task id changes after redistribution. Or are there any other reasons.
One of the recommendations is not to pass Database/ORM objects into the Celery tasks because the may contain stale data. Try to rewrite the task as:
#app.task
def test_celery(model_obj_id):
model_obj = YourModel.objects.get(id=model_obj_id)
if model_obj:
if model_obj.celery_task_id == test_celery.request.id:
## Do something

Haystack with Elasticsearch: Unit test gets different results when run in isolation

I have two different tests, and both are failing when run with other tests. I'm going to display one of them here. This test is for testing that synonyms are working. I've got the following synonyms in my synonym.txt file:
knife, machete
bayonet, dagger, sword
the unit test looks like this:
def test_synonyms(self):
"""
Test that synonyms are working
"""
user = UserFactory()
SubscriberFactory.create(user=user)
descriptions = [
'bayonet',
'dagger',
'sword',
'knife',
'machete'
]
for desc in descriptions:
ListingFactory.create(user=user,
description="Great {0} for all of your undertakings".format(desc))
call_command('update_index', settings.LISTING_INDEX, using=[settings.LISTING_INDEX])
self.sqs = SearchQuerySet().using(settings.LISTING_INDEX)
self.assertEqual(self.sqs.count(), 5)
# 3 of the 5 are in one group, 2 in the other
self.assertEqual(self.sqs.auto_query('bayonet').count(), 3)
self.assertEqual(self.sqs.auto_query('dagger').count(), 3)
self.assertEqual(self.sqs.auto_query('sword').count(), 3)
# 2 of the 5 in this group
self.assertEqual(self.sqs.auto_query('knife').count(), 2)
self.assertEqual(self.sqs.auto_query('machete').count(), 2)
The problem is that when I run the test in isolation with the command ./manage.py test AnalyzersTestCase.test_synonyms it works fine. But if I run it along with other tests, it fails, returning 1 result where it should return 3. If I run a raw elasticsearch query at that point, elasticsearch returns 1 result. So it must be something in the setup of the index... but I'm deleting the index in the setup() method, so I don't see how it can be in a different state when run in isolation from when it's run alongside other tests.
Any help you can give would be great.
Figured it out...
Haystack's connections singleton needed to be cleared between tests, so:
import haystack
for key, opts in haystack.connections.connections_info.items():
haystack.connections.reload(key)
call_command('clear_index', interactive=False, verbosity=0)

How do you skip a unit test in Django?

How do forcibly skip a unit test in Django?
#skipif and #skipunless is all I found, but I just want to skip a test right now for debugging purposes while I get a few things straightened out.
Python's unittest module has a few decorators:
There is plain old #skip:
from unittest import skip
#skip("Don't want to test")
def test_something():
...
If you can't use #skip for some reason, #skipIf should work. Just trick it to always skip with the argument True:
#skipIf(True, "I don't want to run this test yet")
def test_something():
...
unittest docs
Docs on skipping tests
If you are looking to simply not run certain test files, the best way is probably to use fab or other tool and run particular tests.
Django 1.10 allows use of tags for unit tests. You can then use the --exclude-tag=tag_name flag to exclude certain tags:
from django.test import tag
class SampleTestCase(TestCase):
#tag('fast')
def test_fast(self):
...
#tag('slow')
def test_slow(self):
...
#tag('slow', 'core')
def test_slow_but_core(self):
...
In the above example, to exclude your tests with the "slow" tag you would run:
$ ./manage.py test --exclude-tag=slow

Django-celery task and django transaction

I have a question regarding transactions and celery tasks. So it's no mystery to me that of course if you have a transaction and a celery task accessing the same table/records we'll have a race condition.
However, consider the following piece of code:
def f(self):
# function of module that inherits from models.Model
self.field_a = datetime.now()
self.save()
transaction.commit_unless_managed()
# depending on the configuration of this module
# this might return None or a datetime object.
eta = self.get_task_eta()
if eta:
celery_task_do_something.apply_async(args=(self.pk, self.__class__),
eta=eta)
else:
celery_task_do_something.delay(self.pk, self.__class__)
Here's the celery task:
def celery_task_do_something(pk, cls):
o = cls.objects.get(pk=pk)
if o.field_a:
# perform something
return True
return False
As you can see, before creating the task we call transaction.commit_unless_managed and it should commit, since django transaction is not currently managed.
However, when running celery task the field field_a is not set.
My question:
Since we do commit before creating the task, is it still possible that there's a race condition?
Additional info
We're using Postgres version 9.1
Every transaction is run with READ COMMITTED isolation level
On a different db with engine dowant.lib.db.backends.postgresql_psycopg2_debugger field_a is already set and the task works as expected. With engine dowant.lib.db.backends.postgresql_psycopg2_hstore_ready the described issue appears (not sure if it's related with the engine).
Celery version is 2.2
I tried different databases. Still the same behavior, except when the engines change. So that's why I mentioned this.
Thanks a lot.
Try to add self.__class__.objects.select_for_update().get(pk=self.pk) before save and see what happens.
It should block all reads to this row untill commit is done.
This is late but since django 1.9
transaction.on_commit(lambda: enqueue_atask()))