How can I limit database query time during web requests? - django

We've got a pretty typical django app running on postgresql 9.0. We've recently discovered some db queries that have run for over 4 hours, due to inefficient searches in the admin interface. While we plan to fix these queries, as a safeguard we'd like to artificially constrain database query time to 15 seconds--but only in the context of a web request; batch jobs and celery tasks should not be bounded by this constraint.
How can we do that? Or is it a terrible idea?

The best way to do this would be to set up a role/user that is only used to run the web requests, then set the statement_timeout on that role.
ALTER ROLE role_name SET statement_timeout = 15000
All other roles will use the global setting of statement_timeout (which is disabled in a stock install).

You will need to handle this manually. That is checking for the 15 second rule and killing the queries that violate it.
Query pg_stat_activity and find the violators and issue calls to pg_terminate_backend(procpid) to kill the offenders.
Something like this in a loop:
SELECT pg_terminate_backend(pg_stat_activity.procpid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'TARGET_DB'
AND usename = 'WEBUSERNAME'
AND (now()-query_start) > '00:00:15';

As far as the timing goes, you could pass all of your queries through a class which, on instantiation, spawns two threads: one for the query, and one for a timer. If the timer reaches 15 seconds, then kill the thread with the query.
As far as figuring out if the query is instantiated from a web request, I don't know enough about Django to be able to help you. Simplistically, I would say, in your class that handles your database calls, an optional parameter to the constructor could be something like context, which could be http in the event of a web request and "" for anything else.

Related

Generate automatic records in a table in Django even if the system is not being used

Does anyone know a way to generate a record in a table without the system being used by the user?
I need to generate something similar to a notification or reminder, with various data obtained from other tables, something similar to a report
Thank you
To run periodic tasks, you will need some sort of task scheduler like celery or huey. With that in place, you can just create and save instances of whatever model you have in mind from the task scripts and the task scheduler will repeat it periodically.

Django Celery - Creating duplicate users due to mutiple workers

I am facing a weird issue with celery in production. Currently, the production server has 4 celery workers which handle all the tasks registered by my django app. No custom queues are defined. The workers are basically 4 separate supervisor conf files.
Now, in my app I am handling facebook webhook data, and I want a user with a specific FacebookID to be only created once on my backend. But, recently I checked and found out that there are users who have the same FacebookID, which should not have happened.
What happened I think was e.g. user with FacebookID 666 sent me a webhook data. Now, a task is created which will create a new user instance in my database with FacebookID 666. Now, before the user is created in my database, the user hits me with another data, which also created a task but under a different worker, and thus I got myself two users with the same FacebookID.
Is there any way I can configure celery to handle a user with a specific FacebookID to create tasks only in ONE worker? Or have I completely misjudged the situation over here?
Essentially, you need a user-level distributed lock to prevent multiple workers from working on the same user. There are several ways to accomplish this, the most straightforward being a database such as mysql or redis. In mysql, the first process would transactionally (1) check for an existing row in a database table with the user ID (e.g., email or other unique identifier) and (2) if no row exists, creating that row; (3) and if a row exists, return early without doing anything. You can also do this in redis using a redlock or for smaller systems just using SETNX

Django - logging each view's action

I'm thinking of creating a log system for my django web application. The web application is quite comprehensive in its use (covers all aspects of a business's processes) so I'd like to track every event that happens. Specifically, I'd like to log every view that runs and not just the "main" ones and, potentially, log what is happening within the view as its executed.
While I'm in the "idea" stage of the logging system, I've quickly hit a few questions that leave me unsure how to proceed. Here are the main questions I have:
I'm thinking of logging all of the events in the same MySQL database that the main web app holds its data. The concern I have is bloating the MySQL database into a massive DB. Also, if the DB crashes or is destroyed somehow (yes I have backups) I'll loose my log too which blows away any ability to track down the problem. Do I use a seperate DB or just go with text files?
How granular do I go? Initially I was thinking of simply logging things like, "Date - In view myView". However, as I'm thinking about it, it would be nice to log all the stuff that happens within the view. Doing this could make the log massive! and would also make my code ugly with so many log entry lines mixed into the code. This kind of detail:
Date - entered view myView
Date - in view myView, retrieved object myObject from the DB
Date - in view myView, setting myObject field myField to myNewValue
Date - leaving myView
Those are my main thoughts at this point. Any advice on this front?
Thanks
I think the best and right way is to create your own custom middleware where you can log actually everything you need.
Here are some links on the subject:
middleware snippets
http://djangosnippets.org/snippets/2624/
http://djangosnippets.org/snippets/290/
http://djangosnippets.org/snippets/264/
django-logging-middleware (pretty old by may give you an idea)
django-request
django.db.backends logging
Is there a Django middleware/plugin that logs all my requests in a organized fashion?
Django verbose request logging
log all sql queries
django orm, how to view (or log) the executed query?
Also, consider using sentry error logging and aggregation platform instead of writing logs into the database. FYI, see using a database for logging.
If you want to log any action run in every view, you can for example, replace entered view A and exited view A by a line in these words: view A - 147ms.
As alecxe stated, you can log requests/SQL, there are plenty of ways to do it with middleware. About database (object) actions, you can tie individual saves, updates and deletes with signals.
For bulk updates and deletes, you could (it's not a clean way but it would work) monkey-patch manager and queryset methods to add logging.
This way you can log actions rather than SQL.
I would see lines like this:
[2013/09/11 15:11:12.0153] view app.module.view 200 148ms
[2013/09/11 15:11:12.0189] orm save:auth.User,id=1 3ms
This is a quick and dirty proposal, but, maybe it's worth it.

Cronjob: Web Service query

I have a cronjob that runs every hours and parse 150,000+ records. Each record is summarized individually in a MySQL tables. I use two web services to retrieve the user information.
User demographic (ip, country, city etc.)
Phone information (if landline or cell phone and if cell phone what is the carrier)
Every time I get 1 record I check if I have information and if not I call these web services. After tracing my code I found out both of these calls takes 2 to 4 seconds and it makes my cronjob very slow and I can't compile statistics on time.
Is there a way to make these web service faster?
Thanks
simple:
get the data locally and use mellissa data:
for ip: http://w10.melissadata.com/dqt/websmart/ip-locator.htm
for phone: http://www.melissadata.com/fonedata.html
you can also cache them using memcache or APC which will make it faster since he does not have to request the data from the api or database.
A couple of ideas... if the same users are returning, caching the data in another table would be very helpful... you would only look it up once and have it for returning users. Upon re-reading the question it looks like you are doing that.
Another option would be to spawn new threads when you need to do the look-ups. This could be a new thread for each request, or if this is not feasible you could have n service threads ready to do the look-ups and update the results.

Any job queue systems that allow scheduling jobs by date?

I have a Django application.
One of my models looks like this:
class MyModel(models.Model):
def house_cleaning(self):
// cleaning up data of the model instance
Every time when I update an instance of MyModel, I'd need to clean up the data N days later. So I'd like to schedule a job to call
this_instance.house_cleaning()
N days from now.
Is there any job queue that would allow me to:
Integrate well with Django - allow me to call a method of individual model instances
Only run jobs that are scheduled to run today
Ideally handle failures gracefully
Thanks
django-chronograph might be good for your use case. If you write your cleanup jobs as django commands, then you schedule them to run at some time. It runs using unix cron behind the scene.
Is there any reason why a cron job wouldn't work? Or something like django-cron that behaves the same way? It's pretty easy to write stand-alone Django scripts. If you want to trigger house cleaning on some change to you model after a certain number of days, why not create a date flag in the model which is set to N days in the future when the job needs to be scheduled? You could run a script on a daily basis which pulls all records where the date is <= today, calls the instance's house_cleaning() method and then clears the date field. If an exception is raised during the process, it's easy enough to log it or dispatch an email.