Reading Celery task progress using AJAX polling - django

I have a simple Celery task that write some progress data in the database. I need to read this progress update using a django view to give the update to the user.
I used my own tables to write the progress and read it using AJAX polling from client side. Now it's not working and I don't know the reason.
My database backend is PostgreSQL. I tried changing the transaction isolation level using the following (in the read view):
from django.db import transaction
#4 is READ UNCOMITTED
transaction.connections.all()[0].connection.set_isolation_level(4)
I am not sure if this changes the isolation level for a new connection to the database or the one the current transaction is using, but it doesn't seem to work. no progress data can be read until the task has finished and transaction is committed.
Here is second method I tried.
I also found update_state, I write all the progress updates using update_state, but it doesn't seem to be actually written in the database. I run celerycam and configured celery to send events with -E argument.
I want to know what's the proper way to update progress day and retrieve it.
Thanks you.

After some Googling I found out that "READ UNCOMMITTED" is not implemented in PostgreSQL and most probably won't be implemented in the future.
I also found an extension that allows you to read dirty data. It's part of project enter link description here, but this forced me to use raw sql to get the data I wanted.

Related

Django: Ignore transaction for given update

I have a long-running process that should be atomic (Larger update of an application's data). However, during the update, I'd like to let the user know about the progress.
How can I update a model from within the transactioned code, such that it is not part of said transaction?
One potential solution would be to use non-DB storage, such as files, or a second DB connection with the same credentials. Both don't feel like the "right" way to do it...

Flask-Login user status monitoring

I'm developing a small website with Flask & Flask-Login. I need an admin view to monitor all user's online status. I added an is-online column in user db collection and try to update it. But I didn't find any callbacks to handle session expires. How to solve it or any better idea?
Thanks!
FYI, Counting Online Users with Redis | Flask (A Python Microframework) - http://flask.pocoo.org/snippets/71/.
You could get away with checking if users last activity time is bigger(older) than session life time.
if that's the case, you will go on an update their is_online status.
the way i have handled the problem in my application was, since i had a last_activity field in db for each user, to log when they have done what they have done, i could check that value vs session life time.
One very crude method is to create a separate stack that pushes and pops when a user logs in. Assuming that session id and userid on your db is not tied together (i.e., you have separate session id and user id), you can maintain a ledger of sorts and push and pop as sessions are created and destroyed.
You will have to put special emphasis on users running multiple sessions on multiple devices...which is why i put a caveat saying this is a rather crude method.

Django database; how to download huge data in csv format

I have setup my database in Django in which I have huge amount of data. The task is to download all the data at a time in csv format. The problem which I am facing here is when the data size (in number of table rows) is upto 2000, I am able to download it but when number of rows reaches to more than 5k, it throws an error, "Gateway timeout". How to handle such issue. There is no table indexing as of now.
Also, when there is 2K data available, it takes around 18sec to download. So how this can be optimized.
First, make sure the code that is generating the CSV is as optimized as possible.
Next, the gateway timeout is coming from your front end proxy; so simply increase the timeout there.
However, this is a temporary reprieve - as your data set grows, this timeout will be exhausted and you'll keep getting these errors.
The permanent solution is to trigger a separate process to generate the CSV in the background, and then download it once its finished. You can do this by using celery or rq which are both ways to queue tasks for execution (and then collect the results at a later time).
If you are currently using HttpResponse from django.http then you could try using StreamingHttpResponse instead.
Failing that, you could try querying the database directly. For example, if you use the MySql database backend, these answers might help you:
dump-a-mysql-database-to-a-plaintext-csv-backup-from-the-command-line
As for the speed of the transaction, you could experiment with other database backends. However, if you need to do this often enough for the speed to be a major issue then there may be something else in the larger process which should be optimized instead.

Django - logging each view's action

I'm thinking of creating a log system for my django web application. The web application is quite comprehensive in its use (covers all aspects of a business's processes) so I'd like to track every event that happens. Specifically, I'd like to log every view that runs and not just the "main" ones and, potentially, log what is happening within the view as its executed.
While I'm in the "idea" stage of the logging system, I've quickly hit a few questions that leave me unsure how to proceed. Here are the main questions I have:
I'm thinking of logging all of the events in the same MySQL database that the main web app holds its data. The concern I have is bloating the MySQL database into a massive DB. Also, if the DB crashes or is destroyed somehow (yes I have backups) I'll loose my log too which blows away any ability to track down the problem. Do I use a seperate DB or just go with text files?
How granular do I go? Initially I was thinking of simply logging things like, "Date - In view myView". However, as I'm thinking about it, it would be nice to log all the stuff that happens within the view. Doing this could make the log massive! and would also make my code ugly with so many log entry lines mixed into the code. This kind of detail:
Date - entered view myView
Date - in view myView, retrieved object myObject from the DB
Date - in view myView, setting myObject field myField to myNewValue
Date - leaving myView
Those are my main thoughts at this point. Any advice on this front?
Thanks
I think the best and right way is to create your own custom middleware where you can log actually everything you need.
Here are some links on the subject:
middleware snippets
http://djangosnippets.org/snippets/2624/
http://djangosnippets.org/snippets/290/
http://djangosnippets.org/snippets/264/
django-logging-middleware (pretty old by may give you an idea)
django-request
django.db.backends logging
Is there a Django middleware/plugin that logs all my requests in a organized fashion?
Django verbose request logging
log all sql queries
django orm, how to view (or log) the executed query?
Also, consider using sentry error logging and aggregation platform instead of writing logs into the database. FYI, see using a database for logging.
If you want to log any action run in every view, you can for example, replace entered view A and exited view A by a line in these words: view A - 147ms.
As alecxe stated, you can log requests/SQL, there are plenty of ways to do it with middleware. About database (object) actions, you can tie individual saves, updates and deletes with signals.
For bulk updates and deletes, you could (it's not a clean way but it would work) monkey-patch manager and queryset methods to add logging.
This way you can log actions rather than SQL.
I would see lines like this:
[2013/09/11 15:11:12.0153] view app.module.view 200 148ms
[2013/09/11 15:11:12.0189] orm save:auth.User,id=1 3ms
This is a quick and dirty proposal, but, maybe it's worth it.

How can I limit database query time during web requests?

We've got a pretty typical django app running on postgresql 9.0. We've recently discovered some db queries that have run for over 4 hours, due to inefficient searches in the admin interface. While we plan to fix these queries, as a safeguard we'd like to artificially constrain database query time to 15 seconds--but only in the context of a web request; batch jobs and celery tasks should not be bounded by this constraint.
How can we do that? Or is it a terrible idea?
The best way to do this would be to set up a role/user that is only used to run the web requests, then set the statement_timeout on that role.
ALTER ROLE role_name SET statement_timeout = 15000
All other roles will use the global setting of statement_timeout (which is disabled in a stock install).
You will need to handle this manually. That is checking for the 15 second rule and killing the queries that violate it.
Query pg_stat_activity and find the violators and issue calls to pg_terminate_backend(procpid) to kill the offenders.
Something like this in a loop:
SELECT pg_terminate_backend(pg_stat_activity.procpid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'TARGET_DB'
AND usename = 'WEBUSERNAME'
AND (now()-query_start) > '00:00:15';
As far as the timing goes, you could pass all of your queries through a class which, on instantiation, spawns two threads: one for the query, and one for a timer. If the timer reaches 15 seconds, then kill the thread with the query.
As far as figuring out if the query is instantiated from a web request, I don't know enough about Django to be able to help you. Simplistically, I would say, in your class that handles your database calls, an optional parameter to the constructor could be something like context, which could be http in the event of a web request and "" for anything else.