Run celery task when testing (pytest) in Django - django

I have three Celery tasks:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata(): # run synchronously
notify_report_was_not_updated.delay()
#celery_app.task
def load_rawdata():
# load and process file from FTP
return False # some error happened
#celery_app.task
def notify_rawdata_was_not_updated():
pass # send email by Django
I need to test that email was sent if load_rawdata task (function) returns False. For that I have written some test which does not work:
#override_settings(EMAIL_BACKEND='django.core.mail.backends.memcache.EmailBackend')
#override_settings(CELERY_ALWAYS_EAGER=False)
#patch('load_rawdata', MagicMock(return_value=False))
def test_load_rawdata_on_monday():
load_rawdata_on_monday()
assert len(mail.outbox) == 1, "Inbox is not empty"
assert mail.outbox[0].subject == 'Subject here'
assert mail.outbox[0].body == 'Here is the message.'
assert mail.outbox[0].from_email == 'from#example.com'
assert mail.outbox[0].to == ['to#example.com']
It seems notify_rawdata_was_not_updated still being run asynchronously.
How to write proper test?

It looks like two things may be happening:
You should call your task with using the apply() method to run it synchronously.
The CELERY_ALWAYS_EAGER setting should be active to allow subsequent task calls to be executed as well.
#override_settings(EMAIL_BACKEND='django.core.mail.backends.memcache.EmailBackend')
#override_settings(CELERY_ALWAYS_EAGER=True)
#patch('load_rawdata', MagicMock(return_value=False))
def test_load_rawdata_on_monday():
load_rawdata_on_monday.apply()
assert len(mail.outbox) == 1, "Inbox is not empty"
assert mail.outbox[0].subject == 'Subject here'
assert mail.outbox[0].body == 'Here is the message.'
assert mail.outbox[0].from_email == 'from#example.com'
assert mail.outbox[0].to == ['to#example.com']

While #tinom9 is correct about using the apply() method, the issue of notify_rawdata_was_not_updated still running asynchronously has to do with your task definition:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata():
notify_report_was_not_updated.delay() # delay is an async invocation
try this:
#celery_app.task
def load_rawdata_on_monday():
if not load_rawdata():
notify_report_was_not_updated.apply() # run on local thread
and for the test, calling load_rawdata_on_monday() without .delay() or .apply() should still execute the task locally and block until the task result returns. Just make sure you are handling the return values correctly, some celery invocation methods, like apply() return an celery.result.EagerResult instance compared to delay() or apply_async() which return an celery.result.AsyncResult instance, which may not give you the desired outcome if expecting False when you check if not load_rawdata()
or anywhere else you try to get the return value of the function and not the task itself.

I should'v use CELERY_TASK_ALWAYS_EAGER instead of CELERY_ALWAYS_EAGER

Related

Flask, Celery, Pytest, and apply_async countdown confusion

I'm sure this is noted somewhere, but I'm spinning my wheels, so hopefully someone on here can help.
I'm working with a Flask app that uses Celery and I'm testing with Pytest. Here are the two tests:
def test_celery_tasks_countdown_apply_async(celery_app, celery_worker):
r1 = add.apply_async((4, 4), countdown=2)
r2 = multiply.apply_async((4, 4), countdown=2)
assert r1.get(timeout=10) == 8
assert r2.get(timeout=10) == 16
def test_celery_tasks_countdown_signatures(celery_app, celery_worker):
r1 = add.s(4, 4).apply_async(countdown=2)
r2 = multiply.s(4, 4).apply_async(countdown=2)
assert r1.get(timeout=10) == 8
assert r2.get(timeout=10) == 16
The actual tasks are like so:
#shared_task
def add(x, y):
return x + y
#shared_task()
def multiply(x, y):
return x * y
These pass if I run them one by one.
pytest tests/test_tasks.py::test_celery_tasks_countdown_apply_async
pytest tests/test_tasks.py::test_celery_tasks_countdown_signatures
But if I run them together (by calling the whole test_tasks.py file), they both fail.
pytest tests/test_tasks.py
I've got some other tests (eg, for delay) that work. And if I remove the countdown option from these, they both pass if run together.
Why does using the countdown option and running these tests together cause failure?
Right now, my fixture in conftest.py look like this:
pytest_plugins = ('celery.contrib.pytest', )
#pytest.fixture(scope='session')
def celery_config():
return {
'broker_url': 'redis://localhost:8001',
'result_backend': 'redis://localhost:8001',
'task_always_eager': False,
}
UPDATE
I'm leaving this question up as I believe it's a valid question that needs some documentation. While I still do not understand the issue or know how to resolve it within the confines of my automated tests, I have succeeded in getting my to run tasks locally outside of PyTest.
Not a complete answer but more of a workaround to get the results to save to the backend and tests to complete. But the actual tasks aren't sent to the queue and the countdown option seems to be ignored, both of which aren't what I'm looking for.
#pytest.fixture()
def celery_config():
return {
'broker_url': 'redis://localhost:8001',
'result_backend': 'redis://localhost:8001',
'task_always_eager': True,
'task_ignore_result': False,
'task_store_eager_result': True,
}
Reference here.
I'm not sure I want to use these in production, so this is just making my tests pass. I still would love to know why this is all happening.

The pytest test always return abc.is_online True

The test always return abc.is_online True. But abc.is_online should be False because celery task makes is_online False after 60 second from now.
Error Meassge: assert True == False
where True = <ABC: Charles Reyes>.is_online
#app.task(name="task.abc_last_active") #Celery task
def abc_last_active():
now = timezone.localtime()
for xyz in ABC.objects.all():
if not xyz.last_active:
continue
elapsed = now - xyz.last_active
if elapsed.total_seconds() >= settings.ABC_TIMEOUT: #60 Sec
xyz.is_online = False
xyz.save()
#pytest.fixture
def create_abc():
abc = ABC.objects.create(
phone="123234432",
location=Point(1, 4),
last_active=timezone.localtime() - timezone.timedelta(seconds=162),
is_online=True,
)
return abc
#pytest.mark.django_db
def test_inactive_abc_gets_deactivated(create_abc):
print(create_abc.is_online, "before deactivation")
abc_last_active()
print(create_abc.is_online, "after deactivation")
assert create_abc.is_online == False
Use create_abc.refresh_from_db() after running that Celery task. Django won't hit the database here and there each time the obj is changed.
EDIT: Improve answer to clarify what's going on here.
You create a new object in a memory here (which is now saved into DB):
def create_abc():
...
then a celery task takes the DB object and creates a new one in a memory (old one create_abc in a test closure remains old by that point).
After a task is completed, the fixture object that is in memory knows nothing about the new one in the DB. So you must load a fresh DB instance to your old obj by calling refresh_from_db() on it.
For tests, I would recommend instead of refreshing (because you can forget to refresh) create assertions that explicitly use fresh instances and develop such a habit to avoid problems in the future. I.e.:
#pytest.mark.django_db
def test_inactive_abc_gets_deactivated(create_abc):
...
assert ABC.objects.first().is_online == False

Concurrency issue or something else? .save() method + DB timing

So the situtation is this:
I have an endpoint A that creates data and calls .save() on that data (call this functionA) which also sends a post request to an external 3rd party API that will call my endpoint B (call this functionB)
def functionA():
try:
with transaction.atomic()
newData = Blog(title="new blog")
newData.save()
# findSavedBlog = Blog.objects.get(title="new blog")
# print(findSavedBlog)
r = requests.post('www.thirdpartyapi.com/confirm_blog_creation/', some_data) # this post request will trigger the third party to send a post request to endpoint calling functionB
return HttpResponse("Result was: " + r.status)
def functionB():
blogTitle = request.POST.get('blog_title') # assume this evaluates to 'new blog'
# sleep(20)
try:
findBlog = Blog.objects.get(title=blogTitle) # again this will be the same as Blog.objects.get(title="new blog")
except ObjectDoesNotExist as e:
print("Blog not found!")
If I uncomment the findSavedBlog portion of functionA, it will print the saved blog, but functionB will still fail.
If I add in a sleep to function B to wait for the DB to finish writing and then trying to fetch the newly created data, it still fails anyway.
Anyone with knowledge of Django's .save() method and/or some concurrency knowledge help me out here? Much appreciated. Thanks!
EDIT:
The issue was that I was wrapping all of functionA in an atomic block (forgot to write that part of functionA originally), which meant that the transactions don't commit until after functionA returns!

Twisted - how to make lots of Python code non-blocking

I've been trying to get this script to perform the code in hub() in written order.
hub() contains a mix of standard Python code and requests to carry out I/O using Twisted and Crossbar.
However, because the Python code is blocking, reactor doesn't have any chance to carry out those 'publish' tasks. My frontend receives all the published messages at the end.
This code is a massively simplified version of what I'm actually dealing with. The real script (hub() and the other methods it calls) is over 1500 lines long. Modifying all those functions to make them non-blocking is not ideal. I'd rather be able to isolate the changes to a few methods like publish() if that's possible to fix this problem.
I have played around with terms like async, await, deferLater, loopingCall, and others. I have not found an example that helped yet in my situation.
Is there a way to modify publish() (or hub()) so they send out the messages in order?
from autobahn.twisted.component import Component, run
from twisted.internet.defer import inlineCallbacks, returnValue
from twisted.internet import reactor, defer
component = Component(
transports=[
{
u"type": u"websocket",
u"url": u"ws://127.0.0.1:8080/ws",
u"endpoint": {
u"type": u"tcp",
u"host": u"localhost",
u"port": 8080,
},
u"options": {
u"open_handshake_timeout": 100,
}
},
],
realm=u"realm1",
)
#component.on_join
#inlineCallbacks
def join(session, details):
print("joined {}: {}".format(session, details))
def publish(context='output', value='default'):
""" Publish a message. """
print('publish', value)
session.publish(u'com.myapp.universal_feedback', {"id": context, "value": value})
def hub(thing):
""" Main script. """
do_things
publish('output', 'some data for you')
do_more_things
publish('status', 'a progress message')
do_even_more_things
publish('status', 'some more data')
do_all_the_things
publish('other', 'something else')
try:
yield session.register(hub, u'com.myapp.hello')
print("procedure registered")
except Exception as e:
print("could not register procedure: {0}".format(e))
if __name__ == "__main__":
run([component])
reactor.run()
Your join() function is async (decorated with #inlineCallbacks and contains at least one yield in the body).
Internally it registers function hub() as WAMP RPC; hub() is however not async.
Also the calls to session.publish() are not yielded as async calls should be.
Result: you add a bunch of events to the eventloop but don't await them until you flush the eventloop on application shutdown.
You need to make your function hub and publish async.
#inlineCallbacks
def publish(context='output', value='default'):
""" Publish a message. """
print('publish', value)
yield session.publish(u'com.myapp.universal_feedback', {"id": context, "value": value})
#inlineCallbacks
def hub(thing):
""" Main script. """
do_things
yield publish('output', 'some data for you')
do_more_things
yield publish('status', 'a progress message')
do_even_more_things
yield publish('status', 'some more data')
do_all_the_things
yield publish('other', 'something else')

Celery chain not executing tasks after executing a group

I'm using RabbitMQ version "3.5.7" and Celery 4.0.2 in my project.
This is the code which creates the Celery chain in this file:
#app.route('/transcodeALL', methods=['POST'])
def transcodeToALL():
if request.method == 'POST':
# We will do something like this to simulate actual processing of a video
transcoding_tasks = group(
transcode_1080p.signature(queue='tasks', priority=1, immutable=True),
transcode_720p.signature(queue='tasks', priority=2, immutable=True),
transcode_480p.signature(queue='tasks', priority=3, immutable=True),
transcode_360p.signature(queue='tasks', priority=4, immutable=True)
)
main_task = chain(
common_setup.signature(queue='tasks', immutable=True)
transcoding_tasks,
end_processing.signature(queue='tasks', immutable=True),
)
main_task.apply_async()
return 'Video is getting transcoded to all dimensions!'
else:
return 'ERROR: Wrong HTTP Method'
Here, common_setup is being called and then the group transcoding_tasks is also being called after that. But, end_processing is not called at all.
Somehow, after the group is executed, no other task is called. I've switched the statements in the chain here and checked, and the same problem occurs!
Am I doing something trivially wrong or is this a bug?
Thanks!
UPDATE: Solution found!
This was quite an interesting bug! It took sometime to figure out that result backend should be some persistent backend like SQL or Redis.
So, I made this modification in Celery config:
- celeryconfig['CELERY_RESULT_BACKEND'] = 'amqp://'
+ celeryconfig['CELERY_RESULT_BACKEND'] = 'redis://localhost'
And, Celery chains (and chords) work perfectly.
Hope it helps!