stop django command using sys.exit() - django

Hi I have a problem with the django commands the thing is I need to stop the command if some condition happen, normally in python script I do this using sys.exit() because I don't want the script still doing things I try this with Django and doesn't work there is another way to stop the command running ?
Health and good things.

from the docs:
from django.core.management.base import BaseCommand, CommandError
from polls.models import Poll
class Command(BaseCommand):
help = 'Closes the specified poll for voting'
def add_arguments(self, parser):
parser.add_argument('poll_id', nargs='+', type=int)
def handle(self, *args, **options):
for poll_id in options['poll_id']:
try:
poll = Poll.objects.get(pk=poll_id)
except Poll.DoesNotExist:
raise CommandError('Poll "%s" does not exist' % poll_id)
poll.opened = False
poll.save()
self.stdout.write(self.style.SUCCESS('Successfully closed poll "%s"' % poll_id))
i.e. you should raise a CommandError
though sys.exit generally ought to work fine too (I mean, you said it didn't for you - if it was me I'd be curious to work out why not anyway)

I am surprised to see these solutions, for neither of them works. Sys.exit() just finishes off the current thread and raising a 'CommandError' only starts exception handling. Nor does raising a KeyboardInterrupt have any effect. Perhaps these solutions once worked with earlier versions, or with the server started in a different context.
The one solution I came across here is _thread.interrupt_main(). It stops the server after neatly finishing the current request.

Related

Django. Celery delay() not printing to terminal

I have a simple Celery task in Django:
from celery.decorators import task
#task
def celery_test(x, y):
print x + y
return None
I call it in a view:
...
def get_queryset(self, *args, **kwargs):
celery_test.delay("uno ", "dos")
...
So, when I call the function with delay it doesn't print anything to the terminal, why?, ... when I call it whithout delay it prints things correctly. I'm using RabbitMQ server and it is running fine.
Check your log file for celery. It'll probably print it there.
EDIT
When you use .delay() the task is decoupled from the terminal and all you get printed is something like this:
AsyncResult: 3df665f1-547d-49bb-937b-8190a63bfeb7.
What happens is that you've passed the code to the broker and the broker can't print back to terminal. But if you have a print statements in your code they'll be printed into your celery log.

Logging request timeouts on Django + Gunicorn + Heroku

We have a Django app running Gunicorn with sync workers that's deployed on Heroku. Our request response time shows several requests that hit 30s (and die), which is the default Gunicorn timeout.
What is the best way to log these requests and analyze the timeout? Gunicorn doesn't seem to provide a hook for catching these timeouts, at least not something that's obvious.
One rather rough way to do it is have a "watchdog" timer that interrupts the process after, say, 25 seconds. Once you have an idea of which procs are slow, you can refine the data to figure out what's going on.
Example:
import signal
def timeout(_signum, _frame):
print 'TIMEOUT'
signal.signal(signal.SIGALRM, timeout)
signal.alarm(1) # send SIGALRM in 1 second
print 'waiting'
signal.pause()
print 'done'
Another approach is to fire off a Thread which pokes the main code after a certain amount of elapsed time. It has several caveats -- be sure to read the ActiveState link.
Here's one implementation by Aaron Swartz from ActiveState.com
import threading
class TimeoutError(Exception): pass
def timelimit(timeout):
def internal(function):
def internal2(*args, **kw):
class Calculator(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.result = None
self.error = None
def run(self):
try:
self.result = function(*args, **kw)
except:
self.error = sys.exc_info()[0]
c = Calculator()
c.start()
c.join(timeout)
if c.isAlive():
raise TimeoutError
if c.error:
raise c.error
return c.result
return internal2
return internal
https://github.com/benoitc/gunicorn/pull/768/files added a worker_abort signal which is what I'm using in this case.

Django celery task keep global state

I am currently developing a Django application based on django-tenants-schema. You don't need to look into the actual code of the module, but the idea is that it has a global setting for the current database connection defining which schema to use for the application tenant, e.g.
tenant = tenants_schema.get_tenant()
And for setting
tenants_schema.set_tenant(xxx)
For some of the tasks I would like them to remember the current global tenant selected during the instantiation, e.g. in theory:
class AbstractTask(Task):
'''
Run this method before returning the task future
'''
def before_submit(self):
self.run_args['tenant'] = tenants_schema.get_tenant()
'''
This method is run before related .run() task method
'''
def before_run(self):
tenants_schema.set_tenant(self.run_args['tenant'])
Is there an elegant way of doing it in celery?
Celery (as of 3.1) has signals you can hook into to do this. You can alter the kwargs that were passed in, and on the other side, undo your alterations before they're given to the actual task:
from celery import shared_task
from celery.signals import before_task_publish, task_prerun, task_postrun
from threading import local
current_tenant = local()
#before_task_publish.connect
def add_tenant_to_task(body=None, **unused):
body['kwargs']['tenant_middleware.tenant'] = getattr(current_tenant, 'id', None)
print 'sending tenant: {t}'.format(t=current_tenant.id)
#task_prerun.connect
def extract_tenant_from_task(kwargs=None, **unused):
tenant_id = kwargs.pop('tenant_middleware.tenant', None)
current_tenant.id = tenant_id
print 'current_tenant.id set to {t}'.format(t=tenant_id)
#task_postrun.connect
def cleanup_tenant(**kwargs):
current_tenant.id = None
print 'cleaned current_tenant.id'
#shared_task
def get_current_tenant():
# Here is where you would do work that relied on current_tenant.id being set.
import time
time.sleep(1)
return current_tenant.id
And if you run the task (not showing logging from the worker):
In [1]: current_tenant.id = 1234; ct = get_current_tenant.delay(); current_tenant.id = 5678; ct.get()
sending tenant: 1234
Out[1]: 1234
In [2]: current_tenant.id
Out[2]: 5678
The signals are not called if no message is sent (when you call the task function directly, without delay() or apply_async()). If you want to filter on the task name, it is available as body['task'] in the before_task_publish signal handler, and the task object itself is available in the task_prerun and task_postrun handlers.
I am a Celery newbie, so I can't really tell if this is the "blessed" way of doing "middleware"-type stuff in Celery, but I think it will work for me.
I'm not sure what you mean here, is before_submit executed before the task is called by a client?
In that case I would rather use a with statement here:
from contextlib import contextmanager
#contextmanager
def set_tenant_db(tenant):
prev_tenant = tenants_schema.get_tenant()
try:
tenants_scheme.set_tenant(tenant)
yield
finally:
tenants_schema.set_tenant(prev_tenant)
#app.task
def tenant_task(tenant=None):
with set_tenant_db(tenant):
do_actions_here()
tenant_task.delay(tenant=tenants_scheme.get_tenant())
You can of course create a base task that does this automatically,
you can apply the context in Task.__call__ for example, but I'm not sure
if that saves you much if you can just use the with statement explicitly.

django/celery: Best practices to run tasks on 150k Django objects?

I have to run tasks on approximately 150k Django objects. What is the best way to do this? I am using the Django ORM as the Broker. The database backend is MySQL and chokes and dies during the task.delay() of all the tasks. Related, I was also wanting to kick this off from the submission of a form, but the resulting request produced a very long response time that timed out.
I would also consider using something other than using the database as the "broker". It really isn't suitable for this kind of work.
Though, you can move some of this overhead out of the request/response cycle by launching a task to create the other tasks:
from celery.task import TaskSet, task
from myapp.models import MyModel
#task
def process_object(pk):
obj = MyModel.objects.get(pk)
# do something with obj
#task
def process_lots_of_items(ids_to_process):
return TaskSet(process_object.subtask((id, ))
for id in ids_to_process).apply_async()
Also, since you probably don't have 15000 processors to process all of these objects
in parallel, you could split the objects in chunks of say 100's or 1000's:
from itertools import islice
from celery.task import TaskSet, task
from myapp.models import MyModel
def chunks(it, n):
for first in it:
yield [first] + list(islice(it, n - 1))
#task
def process_chunk(pks):
objs = MyModel.objects.filter(pk__in=pks)
for obj in objs:
# do something with obj
#task
def process_lots_of_items(ids_to_process):
return TaskSet(process_chunk.subtask((chunk, ))
for chunk in chunks(iter(ids_to_process),
1000)).apply_async()
Try using RabbitMQ instead.
RabbitMQ is used in a lot of bigger companies and people really rely on it, since it's such a great broker.
Here is a great tutorial on how to get you started with it.
I use beanstalkd ( http://kr.github.com/beanstalkd/ ) as the engine. Adding a worker and a task is pretty straightforward for Django if you use django-beanstalkd : https://github.com/jonasvp/django-beanstalkd/
It’s very reliable for my usage.
Example of worker :
import os
import time
from django_beanstalkd import beanstalk_job
#beanstalk_job
def background_counting(arg):
"""
Do some incredibly useful counting to the value of arg
"""
value = int(arg)
pid = os.getpid()
print "[%s] Counting from 1 to %d." % (pid, value)
for i in range(1, value+1):
print '[%s] %d' % (pid, i)
time.sleep(1)
To launch a job/worker/task :
from django_beanstalkd import BeanstalkClient
client = BeanstalkClient()
client.call('beanstalk_example.background_counting', '5')
(source extracted from example app of django-beanstalkd)
Enjoy !

How to have django give a HTTP response before continuing on to complete a task associated to the request?

In my django piston API, I want to yield/return a http response to the the client before calling another function that will take quite some time. How do I make the yield give a HTTP response containing the desired JSON and not a string relating to the creation of a generator object?
My piston handler method looks like so:
def create(self, request):
data = request.data
*other operations......................*
incident.save()
response = rc.CREATED
response.content = {"id":str(incident.id)}
yield response
manage_incident(incident)
Instead of the response I want, like:
{"id":"13"}
The client gets a string like this:
"<generator object create at 0x102c50050>"
EDIT:
I realise that using yield was the wrong way to go about this, in essence what I am trying to achieve is that the client receives a response right away before the server moves onto the time costly function of manage_incident()
This doesn't have anything to do with generators or yielding, but I've used the following code and decorator to have things run in the background while returning the client an HTTP response immediately.
Usage:
#postpone
def long_process():
do things...
def some_view(request):
long_process()
return HttpResponse(...)
And here's the code to make it work:
import atexit
import Queue
import threading
from django.core.mail import mail_admins
def _worker():
while True:
func, args, kwargs = _queue.get()
try:
func(*args, **kwargs)
except:
import traceback
details = traceback.format_exc()
mail_admins('Background process exception', details)
finally:
_queue.task_done() # so we can join at exit
def postpone(func):
def decorator(*args, **kwargs):
_queue.put((func, args, kwargs))
return decorator
_queue = Queue.Queue()
_thread = threading.Thread(target=_worker)
_thread.daemon = True
_thread.start()
def _cleanup():
_queue.join() # so we don't exit too soon
atexit.register(_cleanup)
Perhaps you could do something like this (be careful though):
import threading
def create(self, request):
data = request.data
# do stuff...
t = threading.Thread(target=manage_incident,
args=(incident,))
t.setDaemon(True)
t.start()
return response
Have anyone tried this? Is it safe? My guess is it's not, mostly because of concurrency issues but also due to the fact that if you get a lot of requests, you might also get a lot of processes (since they might be running for a while), but it might be worth a shot.
Otherwise, you could just add the incident that needs to be managed to your database and handle it later via a cron job or something like that.
I don't think Django is built either for concurrency or very time consuming operations.
Edit
Someone have tried it, seems to work.
Edit 2
These kind of things are often better handled by background jobs. The Django Background Tasks library is nice, but there are others of course.
You've turned your view into a generator thinking that Django will pick up on that fact and handle it appropriately. Well, it won't.
def create(self, request):
return HttpResponse(real_create(request))
EDIT:
Since you seem to be having trouble... visualizing it...
def stuff():
print 1
yield 'foo'
print 2
for i in stuff():
print i
output:
1
foo
2