Number of queries executed over psycopg2 connection - django

I would like to know the number of sql queries which were executed on a psycopg2 connection.
Is there a way to get this number?
I would like to warn if a http request produces too many statements.
I am running a django application. If DEBUG is True, then I have connection.queries. But I would like to get this value from a production server
Update
I want numbers (statistics) from the prod environment. This question is not about debugging a particular http request.

Have a look at django-silk. It is a profiling tool that records metrics like response times and the number of queries.
If you want to roll you own solution and you are using Django 2.0, you can create a middleware with a connection wrapper. The documentation even showcases a QueryLogger class:
import time
from django.db import connection
class QueryLogger:
def __init__(self):
self.queries = []
def __call__(self, execute, sql, params, many, context):
current_query = {'sql': sql, 'params': params, 'many': many}
start = time.time()
try:
result = execute(sql, params, many, context)
except Exception as e:
current_query['status'] = 'error'
current_query['exception'] = e
raise
else:
current_query['status'] = 'ok'
return result
finally:
duration = time.time() - start
current_query['duration'] = duration
self.queries.append(current_query)
class QueryLogginMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
ql = QueryLogger()
with connection.execute_wrapper(ql):
response = self.get_response(request)
# do something with ql.queries here
return response

The amount of queries made on Production and Development are the same, if you have the same environment on your database and everything else.
I recommend you to use Django Debug Toolbar as mentioned, copy see about how many queries your View are doing and rethink your code based on that, if you want to see about those queries performance i recommend you to use the explain command from postgresql.
I usually, copy the query and paste it with explain inside my postgreaql database shell. See this: http://recordit.co/rGZ2SAo7PX

Related

Django Rest framework GET request on db without related model

Let's say that we have a database with existing data, the data is updated from a bash script and there is no related model on Django for that. Which is the best way to create an endpoint on Django to be able to perform a GET request so to retrieve the data?
What I mean is, that if there was a model we could use something like:
class ModelList(generics.ListCreateAPIView):
queryset = Model.objects.first()
serializer_class = ModelSerializer
The workaround that I tried was to create an APIView and inside that APIView to do something like this:
class RetrieveData(APIView):
def get(self, request):
conn = None
try:
conn = psycopg2.connect(host=..., database=..., user=..., password=..., port=...)
cur = conn.cursor()
cur.execute(f'Select * from ....')
fetched_data = cur.fetchone()
cur.close()
res_list = [x for x in fetched_data]
json_res_data = {"id": res_list[0],
"date": res_list[1],
"data": res_list[2]}
return Response({"data": json_res_data)
except Exception as e:
return Response({"error": 'Error'})
finally:
if conn is not None:
conn.close()
Although I do not believe that this is a good solution, also is a bit slow ~ 2 sec per request. Apart from that, if for example, many Get requests are made at the same time isn't that gonna create a problem on the DB instance, e.g lock table etc?
So I was wondering which is a better / best solution for this kind of problems.
Appreciate your time!

How to turn a Django Rest Framework API View into an async one?

I am trying to build a REST API that will manage some machine learning classification tasks. I have written an API view, which when hit, will trigger the start of a classification task (such as: training an SVM classifier with the data the user provided previously). However, this is a long running task, so I would ideally not have the user wait once they have made a request to this view. Instead, I would like to start this task in the background and give them a response immediately. They can later view the results of the classification in a separate view (haven't implemented that yet.)
I am using ASGI_APPLICATION = 'mlxplorebackend.asgi.application' in settings.py.
Here's my API view in views.py
import asyncio
from concurrent.futures import ProcessPoolExecutor
from django import setup as SetupDjango
# ... other imports
loop = asyncio.get_event_loop()
def DummyClassification():
result = sum(i * i for i in range(10 ** 7))
print(result)
return result
# ... other API views
class TaskExecuteView(APIView):
"""
Once an API call is made to this view, the classification algorithm will start being processed.
Depends on:
1. Parser for the classification algorithm type and parameters
2. Classification algorithm implementation
"""
def get(self, request, taskId, *args, **kwargs):
try:
task = TaskModel.objects.get(taskId = taskId)
except TaskModel.DoesNotExist:
raise Http404
else:
# this is basically the classification task for now
# need to turn this to an async view
with ProcessPoolExecutor(initializer = SetupDjango) as pool:
loop.run_in_executor(pool, DummyClassification)
return Response({ "message": "The task with id: {} has been started".format(task.taskId) }, status = status.HTTP_200_OK)
The problem I am facing is the following:
When I do not use with ProcessPoolExecutor(initializer = SetupDjango) as pool: i.e. without the initializer, I get django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. (full traceback at: https://paste.ubuntu.com/p/ctjmFNYMXW/)
When I do use the initializer, the view no longer remains async, it gets blocked. The response returns after the task is completed, which is about 5 seconds on my machine. I do realize I am not really making use of asyncio.sleep() inside my DummyClassification() function, but I can't figure out the way to do so.
I am guessing this is not the way to do it, therefore any suggestions would be appreciated. I would like to avoid celery if I can, since that seems a tad bit too complicated for me.
Edit:
If I get rid of ProcessPoolExecutor() and simply do loop.run_in_executor(None, DummyClassification), it works as expected, but then only one worker thread is working on the task, which doesn't seem remotely ideal for a classification task.
This was a ride. I at first went through the pain of setting up celery only to find out that the original problem of the classification task using one CPU core remains. Then I switched to django-rq with redis and it is currently working as expected.
from .tasks import Pipeline
class TaskExecuteView(APIView):
"""
Once an API call is made to this view, the classification algorithm will start being processed.
Depends on:
1. Parser for the classification algorithm type
2. Classification algorithm implementation
"""
def get(self, request, taskId, *args, **kwargs):
try:
task = TaskModel.objects.get(taskId = taskId)
except TaskModel.DoesNotExist:
raise Http404
else:
Pipeline.delay(taskId) # this is async now ✔
# mark this as an in-progress task
TaskModel.objects.filter(taskId = taskId).update(inProgress = True)
return Response({ "message": "The task with id: {}, title: {} has been started".format(task.taskId, task.taskTitle) }, status = status.HTTP_200_OK)
tasks.py
from django_rq import job
#job('default', timeout=3600)
def Pipeline(taskId):
# classification task

google-ml-engine custom prediction routine error responses

I have a custom prediction routine in google-ml-engine. Works very well.
I now am doing input checking on the instance data, and want to return error responses from my predict routine.
The example: https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines
Raises exceptions on input errors, etc. However, when this happens the response body always has {'error': Prediction failed: unknown error}. I can see the correct errors are being logged in google cloud console, but the https response is always the same unknown error.
My question is:
How to make the Custom prediction routine return a proper error code and error message string?
Instead of returning a prediction, I can return an error string/code in prediction -but it ends up in the prediction part of the response which seems hacky and doesn't get any of the google errors eg based on instance size.
root:test_deployment.py:35 {'predictions': {'error': "('Instance does not include required sensors', 'occurred at index 0')"}}
What's the best way to do this?
Thanks!
David
Please take a look at the following code, I created a _validate function inside predict and use a custom Exception class.
Basically, I validate instances, before I call the model predict method and handle the exception.
There may be some overhead to the response time when doing this validation, which you need to test for your use case.
requests = [
"god this episode sucks",
"meh, I kinda like it",
"what were the writer thinking, omg!",
"omg! what a twist, who would'v though :o!",
99999
]
api = discovery.build('ml', 'v1')
parent = 'projects/{}/models/{}/versions/{}'.format(PROJECT, MODEL_NAME, VERSION_NAME)
parent = 'projects/{}/models/{}'.format(PROJECT, MODEL_NAME)
response = api.projects().predict(body=request_data, name=parent).execute()
{'predictions': [{'Error code': 1, 'Message': 'Invalid instance type'}]}
Custom Prediction class:
import os
import pickle
import numpy as np
import logging
from datetime import date
import tensorflow.keras as keras
class CustomModelPredictionError(Exception):
def __init__(self, code, message='Error found'):
self.code = code
self.message = message # you could add more args
def __str__(self):
return str(self.message)
def isstr(s):
return isinstance(s, str) or isinstance(s, bytes)
def _validate(instances):
for instance in instances:
if not isstr(instance):
raise CustomModelPredictionError(1, 'Invalid instance type')
return instances
class CustomModelPrediction(object):
def __init__(self, model, processor):
self._model = model
self._processor = processor
def _postprocess(self, predictions):
labels = ['negative', 'positive']
return [
{
"label":labels[int(np.round(prediction))],
"score":float(np.round(prediction, 4))
} for prediction in predictions]
def predict(self, instances, **kwargs):
try:
instances = _validate(instances)
except CustomModelPredictionError as c:
return [{"Error code": c.code, "Message": c.message}]
else:
preprocessed_data = self._processor.transform(instances)
predictions = self._model.predict(preprocessed_data)
labels = self._postprocess(predictions)
return labels
#classmethod
def from_path(cls, model_dir):
model = keras.models.load_model(
os.path.join(model_dir,'keras_saved_model.h5'))
with open(os.path.join(model_dir, 'processor_state.pkl'), 'rb') as f:
processor = pickle.load(f)
return cls(model, processor)
Complete code in this notebook.
If it is still relevant to you, I found a way by using google internal libraries (not sure if it would be endorsed by Google though).
AI platform custom prediction wrapping code only returns custom error message if the Exception thrown is a specific one from their internal library.
It might also not be super reliable as you would have very little control in case Google wants to change it.
class Predictor(object):
def predict(self, instances, **kwargs):
# Your prediction code here
# This is an internal google library, it should be available at prediction time.
from google.cloud.ml.prediction import prediction_utils
raise prediction_utils.PredictionError(0, "Custom error message goes here")
#classmethod
def from_path(cls, model_dir):
# Your logic to load the model here
You would get the following message in your HTTP response
Prediction failed: Custom error message goes here

Logging request timeouts on Django + Gunicorn + Heroku

We have a Django app running Gunicorn with sync workers that's deployed on Heroku. Our request response time shows several requests that hit 30s (and die), which is the default Gunicorn timeout.
What is the best way to log these requests and analyze the timeout? Gunicorn doesn't seem to provide a hook for catching these timeouts, at least not something that's obvious.
One rather rough way to do it is have a "watchdog" timer that interrupts the process after, say, 25 seconds. Once you have an idea of which procs are slow, you can refine the data to figure out what's going on.
Example:
import signal
def timeout(_signum, _frame):
print 'TIMEOUT'
signal.signal(signal.SIGALRM, timeout)
signal.alarm(1) # send SIGALRM in 1 second
print 'waiting'
signal.pause()
print 'done'
Another approach is to fire off a Thread which pokes the main code after a certain amount of elapsed time. It has several caveats -- be sure to read the ActiveState link.
Here's one implementation by Aaron Swartz from ActiveState.com
import threading
class TimeoutError(Exception): pass
def timelimit(timeout):
def internal(function):
def internal2(*args, **kw):
class Calculator(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.result = None
self.error = None
def run(self):
try:
self.result = function(*args, **kw)
except:
self.error = sys.exc_info()[0]
c = Calculator()
c.start()
c.join(timeout)
if c.isAlive():
raise TimeoutError
if c.error:
raise c.error
return c.result
return internal2
return internal
https://github.com/benoitc/gunicorn/pull/768/files added a worker_abort signal which is what I'm using in this case.

How to have django give a HTTP response before continuing on to complete a task associated to the request?

In my django piston API, I want to yield/return a http response to the the client before calling another function that will take quite some time. How do I make the yield give a HTTP response containing the desired JSON and not a string relating to the creation of a generator object?
My piston handler method looks like so:
def create(self, request):
data = request.data
*other operations......................*
incident.save()
response = rc.CREATED
response.content = {"id":str(incident.id)}
yield response
manage_incident(incident)
Instead of the response I want, like:
{"id":"13"}
The client gets a string like this:
"<generator object create at 0x102c50050>"
EDIT:
I realise that using yield was the wrong way to go about this, in essence what I am trying to achieve is that the client receives a response right away before the server moves onto the time costly function of manage_incident()
This doesn't have anything to do with generators or yielding, but I've used the following code and decorator to have things run in the background while returning the client an HTTP response immediately.
Usage:
#postpone
def long_process():
do things...
def some_view(request):
long_process()
return HttpResponse(...)
And here's the code to make it work:
import atexit
import Queue
import threading
from django.core.mail import mail_admins
def _worker():
while True:
func, args, kwargs = _queue.get()
try:
func(*args, **kwargs)
except:
import traceback
details = traceback.format_exc()
mail_admins('Background process exception', details)
finally:
_queue.task_done() # so we can join at exit
def postpone(func):
def decorator(*args, **kwargs):
_queue.put((func, args, kwargs))
return decorator
_queue = Queue.Queue()
_thread = threading.Thread(target=_worker)
_thread.daemon = True
_thread.start()
def _cleanup():
_queue.join() # so we don't exit too soon
atexit.register(_cleanup)
Perhaps you could do something like this (be careful though):
import threading
def create(self, request):
data = request.data
# do stuff...
t = threading.Thread(target=manage_incident,
args=(incident,))
t.setDaemon(True)
t.start()
return response
Have anyone tried this? Is it safe? My guess is it's not, mostly because of concurrency issues but also due to the fact that if you get a lot of requests, you might also get a lot of processes (since they might be running for a while), but it might be worth a shot.
Otherwise, you could just add the incident that needs to be managed to your database and handle it later via a cron job or something like that.
I don't think Django is built either for concurrency or very time consuming operations.
Edit
Someone have tried it, seems to work.
Edit 2
These kind of things are often better handled by background jobs. The Django Background Tasks library is nice, but there are others of course.
You've turned your view into a generator thinking that Django will pick up on that fact and handle it appropriately. Well, it won't.
def create(self, request):
return HttpResponse(real_create(request))
EDIT:
Since you seem to be having trouble... visualizing it...
def stuff():
print 1
yield 'foo'
print 2
for i in stuff():
print i
output:
1
foo
2