Getting celery task id - django

I have made something like that
#app.task
def some_task()
logger.info(app.current_task.request.id)
some_func()
def some_func()
logger.info(app.current_task.request.id)
So I receive normal id inside some_task, but it equals to None inside some_func. How can I get real task id?

You could bind the task and pass the request around rather than relying on a global.
#app.task(bind=True)
def some_task(self)
logger.info(self.request.id)
some_func(self.request)
def some_func(celery_request=None)
# celery_request is optional assuming you're using it elsewhere.
if celery_request:
logger.info(celery_request.id)

Related

Mocking Celery `self.request` attribute for bound tasks when called directly

I have a task foobar:
#app.task(bind=True)
def foobar(self, owner, a, b):
if already_working(owner): # check if a foobar task is already running for owner.
register_myself(self.request.id, owner) # add myself in the DB.
return a + b
How can I mock the self.request.id attribute? I am already patching everything and calling directly the task rather than using .delay/.apply_async, but the value of self.request.id seems to be None (as I am doing real interactions with DB, it is making the test fail, etc…).
For the reference, I'm using Django as a framework, but I think that this problem is just the same, no matter the environment you're using.
Disclaimer: Well, I do not think it was documented somewhere and this answer might be implementation-dependent.
Celery wraps his tasks into celery.Task instances, I do not know if it swaps the celery.Task.run method by the user task function or whatever.
But, when you call a task directly, you call __call__ and it'll push a context which will contain the task ID, etc…
So the idea is to bypass __call__ and Celery usual workings, first:
we push a controlled task ID: foobar.push_request(id=1) for example.
then, we call the run method: foobar.run(*args, **kwargs).
Example:
#app.task(bind=True)
def foobar(self, name):
print(name)
return foobar.utils.polling(self.request.id)
#patch('foobar.utils.polling')
def test_foobar(mock_polling):
foobar.push_request(id=1)
mock_polling.return_value = "done"
assert foobar.run("test") == "done"
mock_polling.assert_called_once_with(1)
You can call the task synchronously using
task = foobar.s(<args>).apply()
This will assign a unique task ID, so the value will not be None and your code will run. Then you can check the results as part of your test.
There is probably a way to do this with patch, but I could not work out a way to assign a property. The most straightforward way is to just mock self.
tasks.py:
#app.task(name='my_task')
def my_task(self, *args, **kwargs):
*__do some thing__*
test_tasks.py:
from mock import Mock
def test_my_task():
self = Mock()
self.request.id = 'ci_test'
my_task(self)

Access flask application context from within greenlet

I have a flask-script command that spawns a long sequence of greenlets. The problem is, these greenlets are unable to access my app context. I get a
"> failed with RuntimeError" at all times (accessing app.logger, per example). Suggestions?
On of my attempts:
spawn(method, app, arg1, arg2)
def spawn(app, arg1, arg2):
with app.app_context():
app.logger.debug('bla bla') # doesn't work
... do stuff
Edit: below gives you access to the request object, but not the current_app, probably not what you are searching for.
You are probably looking for flask.copy_current_request_context(f) which is documented here: http://flask.pocoo.org/docs/0.10/api/#flask.copy_current_request_context
Example:
import gevent
from flask import copy_current_request_context
#app.route('/')
def index():
#copy_current_request_context
def do_some_work():
# do some work here, it can access flask.request like you
# would otherwise in the view function.
...
gevent.spawn(do_some_work)
return 'Regular response'
you can pass a copy of the relevant info from the request, e.g.
import gevent
#app.route('/')
def index():
def do_some_work(data):
# do some work here with data
...
data = request.get_json()
gevent.spawn(do_some_work, data)
return 'Regular response'

Passing web request context transparently to a celery task

I've a multi-tenant setup where I'd like to pass certain customer specific information, specifically request.host to the celery task, where ideally it should be available in a global variable. Is there a way to set this up, in a manner transparent to the application?
the task would be called the same way:
my_background_func.delay(foo, bar)
the task is defined the same way, except that it has access to a global variable called 'request' having an attribute 'host':
#celery_app.task
def my_background_func(foo, bar):
print "running the task for host:" + request.host
here's how I solved it ...
class MyTask(Task):
abstract = True
def delay(self, *args, **kwargs):
return self.apply_async(args, kwargs, headers={'host': request.host})
on the client:
#celery_app.task(base=MyTask, bind=True)
def hellohost(task):
return "hello " + task.request.headers['host']
it works, but strangely hellohost.delay().get() hangs on the client

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.

How to have django give a HTTP response before continuing on to complete a task associated to the request?

In my django piston API, I want to yield/return a http response to the the client before calling another function that will take quite some time. How do I make the yield give a HTTP response containing the desired JSON and not a string relating to the creation of a generator object?
My piston handler method looks like so:
def create(self, request):
data = request.data
*other operations......................*
incident.save()
response = rc.CREATED
response.content = {"id":str(incident.id)}
yield response
manage_incident(incident)
Instead of the response I want, like:
{"id":"13"}
The client gets a string like this:
"<generator object create at 0x102c50050>"
EDIT:
I realise that using yield was the wrong way to go about this, in essence what I am trying to achieve is that the client receives a response right away before the server moves onto the time costly function of manage_incident()
This doesn't have anything to do with generators or yielding, but I've used the following code and decorator to have things run in the background while returning the client an HTTP response immediately.
Usage:
#postpone
def long_process():
do things...
def some_view(request):
long_process()
return HttpResponse(...)
And here's the code to make it work:
import atexit
import Queue
import threading
from django.core.mail import mail_admins
def _worker():
while True:
func, args, kwargs = _queue.get()
try:
func(*args, **kwargs)
except:
import traceback
details = traceback.format_exc()
mail_admins('Background process exception', details)
finally:
_queue.task_done() # so we can join at exit
def postpone(func):
def decorator(*args, **kwargs):
_queue.put((func, args, kwargs))
return decorator
_queue = Queue.Queue()
_thread = threading.Thread(target=_worker)
_thread.daemon = True
_thread.start()
def _cleanup():
_queue.join() # so we don't exit too soon
atexit.register(_cleanup)
Perhaps you could do something like this (be careful though):
import threading
def create(self, request):
data = request.data
# do stuff...
t = threading.Thread(target=manage_incident,
args=(incident,))
t.setDaemon(True)
t.start()
return response
Have anyone tried this? Is it safe? My guess is it's not, mostly because of concurrency issues but also due to the fact that if you get a lot of requests, you might also get a lot of processes (since they might be running for a while), but it might be worth a shot.
Otherwise, you could just add the incident that needs to be managed to your database and handle it later via a cron job or something like that.
I don't think Django is built either for concurrency or very time consuming operations.
Edit
Someone have tried it, seems to work.
Edit 2
These kind of things are often better handled by background jobs. The Django Background Tasks library is nice, but there are others of course.
You've turned your view into a generator thinking that Django will pick up on that fact and handle it appropriately. Well, it won't.
def create(self, request):
return HttpResponse(real_create(request))
EDIT:
Since you seem to be having trouble... visualizing it...
def stuff():
print 1
yield 'foo'
print 2
for i in stuff():
print i
output:
1
foo
2