Flask, Celery, Pytest, and apply_async countdown confusion - flask

I'm sure this is noted somewhere, but I'm spinning my wheels, so hopefully someone on here can help.
I'm working with a Flask app that uses Celery and I'm testing with Pytest. Here are the two tests:
def test_celery_tasks_countdown_apply_async(celery_app, celery_worker):
r1 = add.apply_async((4, 4), countdown=2)
r2 = multiply.apply_async((4, 4), countdown=2)
assert r1.get(timeout=10) == 8
assert r2.get(timeout=10) == 16
def test_celery_tasks_countdown_signatures(celery_app, celery_worker):
r1 = add.s(4, 4).apply_async(countdown=2)
r2 = multiply.s(4, 4).apply_async(countdown=2)
assert r1.get(timeout=10) == 8
assert r2.get(timeout=10) == 16
The actual tasks are like so:
#shared_task
def add(x, y):
return x + y
#shared_task()
def multiply(x, y):
return x * y
These pass if I run them one by one.
pytest tests/test_tasks.py::test_celery_tasks_countdown_apply_async
pytest tests/test_tasks.py::test_celery_tasks_countdown_signatures
But if I run them together (by calling the whole test_tasks.py file), they both fail.
pytest tests/test_tasks.py
I've got some other tests (eg, for delay) that work. And if I remove the countdown option from these, they both pass if run together.
Why does using the countdown option and running these tests together cause failure?
Right now, my fixture in conftest.py look like this:
pytest_plugins = ('celery.contrib.pytest', )
#pytest.fixture(scope='session')
def celery_config():
return {
'broker_url': 'redis://localhost:8001',
'result_backend': 'redis://localhost:8001',
'task_always_eager': False,
}
UPDATE
I'm leaving this question up as I believe it's a valid question that needs some documentation. While I still do not understand the issue or know how to resolve it within the confines of my automated tests, I have succeeded in getting my to run tasks locally outside of PyTest.

Not a complete answer but more of a workaround to get the results to save to the backend and tests to complete. But the actual tasks aren't sent to the queue and the countdown option seems to be ignored, both of which aren't what I'm looking for.
#pytest.fixture()
def celery_config():
return {
'broker_url': 'redis://localhost:8001',
'result_backend': 'redis://localhost:8001',
'task_always_eager': True,
'task_ignore_result': False,
'task_store_eager_result': True,
}
Reference here.
I'm not sure I want to use these in production, so this is just making my tests pass. I still would love to know why this is all happening.

Related

Run celery task when testing (pytest) in Django

I have three Celery tasks:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata(): # run synchronously
notify_report_was_not_updated.delay()
#celery_app.task
def load_rawdata():
# load and process file from FTP
return False # some error happened
#celery_app.task
def notify_rawdata_was_not_updated():
pass # send email by Django
I need to test that email was sent if load_rawdata task (function) returns False. For that I have written some test which does not work:
#override_settings(EMAIL_BACKEND='django.core.mail.backends.memcache.EmailBackend')
#override_settings(CELERY_ALWAYS_EAGER=False)
#patch('load_rawdata', MagicMock(return_value=False))
def test_load_rawdata_on_monday():
load_rawdata_on_monday()
assert len(mail.outbox) == 1, "Inbox is not empty"
assert mail.outbox[0].subject == 'Subject here'
assert mail.outbox[0].body == 'Here is the message.'
assert mail.outbox[0].from_email == 'from#example.com'
assert mail.outbox[0].to == ['to#example.com']
It seems notify_rawdata_was_not_updated still being run asynchronously.
How to write proper test?
It looks like two things may be happening:
You should call your task with using the apply() method to run it synchronously.
The CELERY_ALWAYS_EAGER setting should be active to allow subsequent task calls to be executed as well.
#override_settings(EMAIL_BACKEND='django.core.mail.backends.memcache.EmailBackend')
#override_settings(CELERY_ALWAYS_EAGER=True)
#patch('load_rawdata', MagicMock(return_value=False))
def test_load_rawdata_on_monday():
load_rawdata_on_monday.apply()
assert len(mail.outbox) == 1, "Inbox is not empty"
assert mail.outbox[0].subject == 'Subject here'
assert mail.outbox[0].body == 'Here is the message.'
assert mail.outbox[0].from_email == 'from#example.com'
assert mail.outbox[0].to == ['to#example.com']
While #tinom9 is correct about using the apply() method, the issue of notify_rawdata_was_not_updated still running asynchronously has to do with your task definition:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata():
notify_report_was_not_updated.delay() # delay is an async invocation
try this:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata():
notify_report_was_not_updated.apply() # run on local thread
and for the test, calling load_rawdata_on_monday() without .delay() or .apply() should still execute the task locally and block until the task result returns. Just make sure you are handling the return values correctly, some celery invocation methods, like apply() return an celery.result.EagerResult instance compared to delay() or apply_async() which return an celery.result.AsyncResult instance, which may not give you the desired outcome if expecting False when you check if not load_rawdata()
or anywhere else you try to get the return value of the function and not the task itself.
I should'v use CELERY_TASK_ALWAYS_EAGER instead of CELERY_ALWAYS_EAGER

The pytest test always return abc.is_online True

The test always return abc.is_online True. But abc.is_online should be False because celery task makes is_online False after 60 second from now.
Error Meassge: assert True == False
where True = <ABC: Charles Reyes>.is_online
#app.task(name="task.abc_last_active") #Celery task
def abc_last_active():
now = timezone.localtime()
for xyz in ABC.objects.all():
if not xyz.last_active:
continue
elapsed = now - xyz.last_active
if elapsed.total_seconds() >= settings.ABC_TIMEOUT: #60 Sec
xyz.is_online = False
xyz.save()
#pytest.fixture
def create_abc():
abc = ABC.objects.create(
phone="123234432",
location=Point(1, 4),
last_active=timezone.localtime() - timezone.timedelta(seconds=162),
is_online=True,
)
return abc
#pytest.mark.django_db
def test_inactive_abc_gets_deactivated(create_abc):
print(create_abc.is_online, "before deactivation")
abc_last_active()
print(create_abc.is_online, "after deactivation")
assert create_abc.is_online == False
Use create_abc.refresh_from_db() after running that Celery task. Django won't hit the database here and there each time the obj is changed.
EDIT: Improve answer to clarify what's going on here.
You create a new object in a memory here (which is now saved into DB):
def create_abc():
...
then a celery task takes the DB object and creates a new one in a memory (old one create_abc in a test closure remains old by that point).
After a task is completed, the fixture object that is in memory knows nothing about the new one in the DB. So you must load a fresh DB instance to your old obj by calling refresh_from_db() on it.
For tests, I would recommend instead of refreshing (because you can forget to refresh) create assertions that explicitly use fresh instances and develop such a habit to avoid problems in the future. I.e.:
#pytest.mark.django_db
def test_inactive_abc_gets_deactivated(create_abc):
...
assert ABC.objects.first().is_online == False

Using mock to test if directory exists or not

I have been exploring mock and pytest for a few days now.
I have the following method:
def func():
if not os.path.isdir('/tmp/folder'):
os.makedirs('/tmp/folder')
In order to unit test it, I have decided to patch os.path.isdir and os.makedirs, as shown:
#patch('os.path.isdir')
#patch('os.makedirs')
def test_func(patch_makedirs, patch_isdir):
patch_isdir.return_value = False
assert patch_makedirs.called == True
The assertion fails, irrespective of the return value from patch_isdir. Can someone please help me figure out where I am going wrong?
Can't say for sure having the complete code, but I have the feeling it's related to where you're patching.
You should patch the os module that was imported by the module under test.
So, if you have it like this:
mymodule.py:
def func():
if not os.path.isdir('/tmp/folder'):
os.makedirs('/tmp/folder')
you should make your _test_mymodule.py_ like this:
#patch('mymodule.os')
def test_func(self, os_mock):
os_mock.path.isdir.return_value = False
assert os_mock.makedirs.called
Note that this specific test is not that useful, since it's essentially testing if the module os works -- and you can probably assume that is well tested. ;)
Your tests would probably be better if focused on your application logic (maybe, the code that calls func?).
You are missing the call to func().
#patch('os.path.isdir')
#patch('os.makedirs')
def test_func(patch_makedirs, patch_isdir):
patch_isdir.return_value = False
yourmodule.func()
assert patch_makedirs.called == True

how to make flask pass a generator to task such as celery

I have a bunch of code that I have working in flask correctly, but these requests can take over 30 minutes to finish. I am using chained generators to use my existing code with yields to return to the browser.
Since these tasks take 30 minutes or more to complete, I want to offload these tasks but at am a loss. I have not succesfully gotten celery/rabbitmq/redis or any other combination to work correctly and am looking for how I can accomplish this so my page returns right away and I can check if the task is complete in the background.
Here is example code that works for now but takes 4 seconds of processing for the page to return.
I am looking for advice on how to get around this problem, can celery/redis or rabbitmq deal with generators like this? should I be looking at a different solution?
Thanks!
import time
import flask
from itertools import chain
class TestClass(object):
def __init__(self):
self.a=4
def first_generator(self):
b = self.a + 2
yield str(self.a) + '\n'
time.sleep(1)
yield str(b) + '\n'
def second_generator(self):
time.sleep(1)
yield '5\n'
def third_generator(self):
time.sleep(1)
yield '6\n'
def application(self):
return chain(tc.first_generator(),
tc.second_generator(),
tc.third_generator())
tc = TestClass()
app = flask.Flask(__name__)
#app.route('/')
def process():
return flask.Response(tc.application(), mimetype='text/plain')
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000, debug=True)
Firstly, it's not clear what it would even mean to "pass a generator to Celery". The whole point of Celery is that is not directly linked to your app: it's a completely separate thing, maybe even running on a separate machine, to which you would pass some fixed data. You can of course pass the initial parameters and get Celery itself to call the functions that create the generators for processing, but you can't drip-feed data to Celery.
Secondly, this is not at all an appropriate use for Celery in any case. Celery is for offline processing. You can't get it to return stuff to a waiting request. The only thing you could do would be to get it to save the results somewhere accessible by Flask, and then get your template to fire an Ajax request to get those results when they are available.

django-nose unit testing a celery task ... missing database data

I'm writing unit tests for a celery task using django-nose. It's fairly typical; a blank test database (REUSE_DB=0) that is pre-populated via a fixture at test time.
The problem I have is that even though the TestCase is loading the fixture and I can access the objects from the test method, the same query fails when executed within an async celery task.
I've checked that the settings.DATABASES["default"]["name"] are the same both in the test method and the task under test. I've also validated the that the task that's under test behaves correctly when invoked as a regular method call.
And that's about where I'm out of ideas.
Here's a sample:
class MyTest(TestCase):
fixtures = ['test_data.json']
def setUp(self):
settings.CELERY_ALWAYS_EAGER = True # seems to be required; if not I get socket errors for Rabbit
settings.CELERY_EAGER_PROPAGATES_EXCEPTIONS = True # exposes errors in the code under test.
def test_city(self):
self.assertIsNotNone(City.objects.get(name='brisbane'))
myTask.delay(city_name='brisbane').get()
# The following works fine: myTask('brisbane')
from celery.task import task
#task()
def myTask(city_name):
c = City.objects.count() # gives 0
my_city = City.objects.get(name=city_name) # raises DoesNotExist exception
return
This sounds a lot like a bug in django-celery 2.5 which was fixed in 2.5.2: https://github.com/celery/django-celery/pull/116
The brief description of the bug is that the django-celery loader was closing the DB connection prior to executing the task even eager tasks. Since the tests run inside a transaction the new connection for the task execution can't see the data created in the setUp.