Logging request timeouts on Django + Gunicorn + Heroku - django

We have a Django app running Gunicorn with sync workers that's deployed on Heroku. Our request response time shows several requests that hit 30s (and die), which is the default Gunicorn timeout.
What is the best way to log these requests and analyze the timeout? Gunicorn doesn't seem to provide a hook for catching these timeouts, at least not something that's obvious.

One rather rough way to do it is have a "watchdog" timer that interrupts the process after, say, 25 seconds. Once you have an idea of which procs are slow, you can refine the data to figure out what's going on.
Example:
import signal
def timeout(_signum, _frame):
print 'TIMEOUT'
signal.signal(signal.SIGALRM, timeout)
signal.alarm(1) # send SIGALRM in 1 second
print 'waiting'
signal.pause()
print 'done'

Another approach is to fire off a Thread which pokes the main code after a certain amount of elapsed time. It has several caveats -- be sure to read the ActiveState link.
Here's one implementation by Aaron Swartz from ActiveState.com
import threading
class TimeoutError(Exception): pass
def timelimit(timeout):
def internal(function):
def internal2(*args, **kw):
class Calculator(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.result = None
self.error = None
def run(self):
try:
self.result = function(*args, **kw)
except:
self.error = sys.exc_info()[0]
c = Calculator()
c.start()
c.join(timeout)
if c.isAlive():
raise TimeoutError
if c.error:
raise c.error
return c.result
return internal2
return internal

https://github.com/benoitc/gunicorn/pull/768/files added a worker_abort signal which is what I'm using in this case.

Related

Task overlap in Django-Q

I have a task that I want to run every minute so the data is as fresh as possible. However depending on the size of the update it can take longer than one minute to finish. Django-Q creates new task and queues it every minute though so there is some overlap synchronizing the same data pretty much. Is it possible to not schedule the task that is already in progress?
I ended up creating decorator that locks the task execution and on new task run just returns immediately if the lock is not available. Timeout is 1 hour (enough in my case).
from functools import wraps
from django.core.cache import cache
from redis.exceptions import LockNotOwnedError
def django_q_task_lock(func):
"""
Decorator for django q tasks for preventing overlap in parallel task runs
"""
#wraps(func)
def wrapped_task(*args, **kwargs):
task_lock = cache.lock(f"django_q-{func.__name__}", timeout=60 * 60)
if task_lock.acquire(blocking=False):
try:
func(*args, **kwargs)
except Exception as e:
try:
task_lock.release()
except LockNotOwnedError:
pass
raise e
try:
task_lock.release()
except LockNotOwnedError:
pass
return wrapped_task
#django_q_task_lock
def potentialy_long_running_task():
...
# task logic
...

Number of queries executed over psycopg2 connection

I would like to know the number of sql queries which were executed on a psycopg2 connection.
Is there a way to get this number?
I would like to warn if a http request produces too many statements.
I am running a django application. If DEBUG is True, then I have connection.queries. But I would like to get this value from a production server
Update
I want numbers (statistics) from the prod environment. This question is not about debugging a particular http request.
Have a look at django-silk. It is a profiling tool that records metrics like response times and the number of queries.
If you want to roll you own solution and you are using Django 2.0, you can create a middleware with a connection wrapper. The documentation even showcases a QueryLogger class:
import time
from django.db import connection
class QueryLogger:
def __init__(self):
self.queries = []
def __call__(self, execute, sql, params, many, context):
current_query = {'sql': sql, 'params': params, 'many': many}
start = time.time()
try:
result = execute(sql, params, many, context)
except Exception as e:
current_query['status'] = 'error'
current_query['exception'] = e
raise
else:
current_query['status'] = 'ok'
return result
finally:
duration = time.time() - start
current_query['duration'] = duration
self.queries.append(current_query)
class QueryLogginMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
ql = QueryLogger()
with connection.execute_wrapper(ql):
response = self.get_response(request)
# do something with ql.queries here
return response
The amount of queries made on Production and Development are the same, if you have the same environment on your database and everything else.
I recommend you to use Django Debug Toolbar as mentioned, copy see about how many queries your View are doing and rethink your code based on that, if you want to see about those queries performance i recommend you to use the explain command from postgresql.
I usually, copy the query and paste it with explain inside my postgreaql database shell. See this: http://recordit.co/rGZ2SAo7PX

Celery group multiple tasks in one design

I just getting familiar with Celery and have a question. My setup is Django-Redis-Celery
Lets take an example of a task sending email:
TASKS
#task
def send_email(message):
mailserver.sendOneMessage(message)
VIEWS
class newaccount(APIView):
def post(self, request, format=None):
send_email.delay(request.data.email)
This works perfectly, Django sends messages to Redis and those are picked up by Celery then to execute task. But I want to improve the system so that Celery picks up all messages from Redis at certain intervals and executes a single task with multiple messages. This because, connecting to the email server is slow and sending multiple messages as a single request will result in a faster process.
I want something like this to work:
TASKS
#task
def send_emails(messages):
mailserver.sendMultipleMessages(messages)
Thoughts?
Since i am using redis as a cache (django-redis) already i implemented the following workflow:
Step 1. Create a task that adds new emails to cache
#shared_task()
def add_email(user_id):
cache.set("email#{}".format(user_id), None, timeout=None)
Step 2. Create a periodic task that runs every second and looks up for new emails in the cache
class ProcessEmailsTask(PeriodicTask):
run_every = timedelta(seconds=1)
def run(self, **kwargs):
call_email()
def call_email():
item_exists = True
ids = []
while item_exists:
try:
key = next(cache.iter_keys("email#*"))
ids.append(key.split("email#")[1])
cache.delete_pattern(key)
except:
item_exists = False
if len(ids) > 0:
send_emails_to(ids)
Step 3. Run both celery workers and celery beat and profit!

stop django command using sys.exit()

Hi I have a problem with the django commands the thing is I need to stop the command if some condition happen, normally in python script I do this using sys.exit() because I don't want the script still doing things I try this with Django and doesn't work there is another way to stop the command running ?
Health and good things.
from the docs:
from django.core.management.base import BaseCommand, CommandError
from polls.models import Poll
class Command(BaseCommand):
help = 'Closes the specified poll for voting'
def add_arguments(self, parser):
parser.add_argument('poll_id', nargs='+', type=int)
def handle(self, *args, **options):
for poll_id in options['poll_id']:
try:
poll = Poll.objects.get(pk=poll_id)
except Poll.DoesNotExist:
raise CommandError('Poll "%s" does not exist' % poll_id)
poll.opened = False
poll.save()
self.stdout.write(self.style.SUCCESS('Successfully closed poll "%s"' % poll_id))
i.e. you should raise a CommandError
though sys.exit generally ought to work fine too (I mean, you said it didn't for you - if it was me I'd be curious to work out why not anyway)
I am surprised to see these solutions, for neither of them works. Sys.exit() just finishes off the current thread and raising a 'CommandError' only starts exception handling. Nor does raising a KeyboardInterrupt have any effect. Perhaps these solutions once worked with earlier versions, or with the server started in a different context.
The one solution I came across here is _thread.interrupt_main(). It stops the server after neatly finishing the current request.

Recover from task failed beyond max_retries

I am attempting to asynchronously consume a web service because it takes up to 45 seconds to return. Unfortunately, this web service is also somewhat unreliable and can throw errors. I have set up django-celery and have my tasks executing, which works fine until the task fails beyond max_retries.
Here is what I have so far:
#task(default_retry_delay=5, max_retries=10)
def request(xml):
try:
server = Client('https://www.whatever.net/RealTimeService.asmx?wsdl')
xml = server.service.RunRealTimeXML(
username=settings.WS_USERNAME,
password=settings.WS_PASSWORD,
xml=xml
)
except Exception, e:
result = Result(celery_id=request.request.id, details=e.reason, status="i")
result.save()
try:
return request.retry(exc=e)
except MaxRetriesExceededError, e:
result = Result(celery_id=request.request.id, details="Max Retries Exceeded", status="f")
result.save()
raise
result = Result(celery_id=request.request.id, details=xml, status="s")
result.save()
return result
Unfortunately, MaxRetriesExceededError is not being thrown by retry(), so I'm not sure how to handle the failure of this task. Django has already returned HTML to the client, and I am checking the contents of Result via AJAX, which is never getting to a full fail f status.
So the question is: How can I update my database when the Celery task has exceeded max_retries?
The issue is that celery is trying to re-raise the exception you passed in when it hits the retry limit. The code for doing this re-raising is here: https://github.com/celery/celery/blob/v3.1.20/celery/app/task.py#L673-L681
The simplest way around this is to just not have celery manage your exceptions at all:
#task(max_retries=10)
def mytask():
try:
do_the_thing()
except Exception as e:
try:
mytask.retry()
except MaxRetriesExceededError:
do_something_to_handle_the_error()
logger.exception(e)
You can override the after_return method of the celery task class, this method is called after the execution of the task whatever is the ret status (SUCCESS,FAILED,RETRY)
class MyTask(celery.task.Task)
def run(self, xml, **kwargs)
#Your stuffs here
def after_return(self, status, retval, task_id, args, kwargs, einfo=None):
if self.max_retries == int(kwargs['task_retries']):
#If max retries are equals to task retries do something
if status == "FAILURE":
#You can do also something if the tasks fail instead of check the retries
http://readthedocs.org/docs/celery/en/latest/reference/celery.task.base.html#celery.task.base.BaseTask.after_return
http://celery.readthedocs.org/en/latest/reference/celery.app.task.html?highlight=after_return#celery.app.task.Task.after_return
With Celery version 2.3.2 this approach has worked well for me:
class MyTask(celery.task.Task):
abstract = True
def after_return(self, status, retval, task_id, args, kwargs, einfo):
if self.max_retries == self.request.retries:
#If max retries is equal to task retries do something
#task(base=MyTask, default_retry_delay=5, max_retries=10)
def request(xml):
#Your stuff here
I'm just going with this for now, spares me the work of subclassing Task and is easily understood.
# auto-retry with delay as defined below. After that, hook is disabled.
#celery.shared_task(bind=True, max_retries=5, default_retry_delay=300)
def post_data(self, hook_object_id, url, event, payload):
headers = {'Content-type': 'application/json'}
try:
r = requests.post(url, data=payload, headers=headers)
r.raise_for_status()
except requests.exceptions.RequestException as e:
if self.request.retries >= self.max_retries:
log.warning("Auto-deactivating webhook %s for event %s", hook_object_id, event)
Webhook.objects.filter(object_id=hook_object_id).update(active=False)
return False
raise self.retry(exc=e)
return True