Django: Ignore transaction for given update - django

I have a long-running process that should be atomic (Larger update of an application's data). However, during the update, I'd like to let the user know about the progress.
How can I update a model from within the transactioned code, such that it is not part of said transaction?
One potential solution would be to use non-DB storage, such as files, or a second DB connection with the same credentials. Both don't feel like the "right" way to do it...

Related

What is the best practice to write an API for an action that affects multiple tables?

Consider the example use case as below.
You need to invite a Company as your connection. The sub actions that needs to happen in this situation is.
A Company need to be created by adding an entry to the Company table.
A User account needs to be created for the staff member to login by creating an entry in the User table.
A Staff object is created to ensure that the User has access to the Company by creating an entry in the Staff table.
The invited company is related to the invitee company, so a relation similar to friendship is created to connect the two companies by creating an entry in the Connection table.
An Invitation object is created to store the information as to who invited who onto the system, with other information like invitation time, invite message etc. For this, and entry is created in the Invitation table.
An email needs to be sent to the user to accept invitation and join by setting password.
As you can see, entries are to be made in 5 Tables.
Is it a good practice to do all this in a single API call?
If not, what are the other option.
How do I maintain data integrity if it is to be split into multiple APIs?
If the actions need to be atomic, then it's definitely best to do this in a single API call. Otherwise, you run the risk of someone not completing all the tasks required and leaving the resources in a potentially conflicting state.
That said, you're not updating a single resource, so this isn't a good fit for a single RESTful resource creation call (e.g., POST /companyInvitations) -- as all these other things being created and stitched together might lead to quite a bit of confusion.
If the action you're doing is "inviting a Company", then one option is to use Google's "custom method" syntax (POST /resources/1234:action) as defined in AIP-136. In this case, you might do POST /companies/1234:invite which says "I want to invite Company #1234 to be my connection".
Under the hood, this might atomically upsert (create if resources don't already exist) all the right things that you've listed out.
Something to consider when approaching an API call where multiple things happen when called, is how long those downstream actions take. Leaving the api call blocked isn't the best idea in the world while things are processing in the background.
You could consider (depending on your usecase) taking in the api request, immediately responding with a 200 status, and dropping the request onto an internal queue for processing. When your background service picks up the request it can update whatever needs to be updated and manage the transactions appropriately etc. This also caters for horizontal scaling scenarios where lots of "worker" services can be deployed to process the requests.
As part of this you could consider adding another "status" endpoint where requests can be made to find out how things are going. To avoid lots of polling status requests you could also take in callback details as part of the original api call which then gets called when the background processing is complete. Or you could do both!

Reading Celery task progress using AJAX polling

I have a simple Celery task that write some progress data in the database. I need to read this progress update using a django view to give the update to the user.
I used my own tables to write the progress and read it using AJAX polling from client side. Now it's not working and I don't know the reason.
My database backend is PostgreSQL. I tried changing the transaction isolation level using the following (in the read view):
from django.db import transaction
#4 is READ UNCOMITTED
transaction.connections.all()[0].connection.set_isolation_level(4)
I am not sure if this changes the isolation level for a new connection to the database or the one the current transaction is using, but it doesn't seem to work. no progress data can be read until the task has finished and transaction is committed.
Here is second method I tried.
I also found update_state, I write all the progress updates using update_state, but it doesn't seem to be actually written in the database. I run celerycam and configured celery to send events with -E argument.
I want to know what's the proper way to update progress day and retrieve it.
Thanks you.
After some Googling I found out that "READ UNCOMMITTED" is not implemented in PostgreSQL and most probably won't be implemented in the future.
I also found an extension that allows you to read dirty data. It's part of project enter link description here, but this forced me to use raw sql to get the data I wanted.

Django - logging each view's action

I'm thinking of creating a log system for my django web application. The web application is quite comprehensive in its use (covers all aspects of a business's processes) so I'd like to track every event that happens. Specifically, I'd like to log every view that runs and not just the "main" ones and, potentially, log what is happening within the view as its executed.
While I'm in the "idea" stage of the logging system, I've quickly hit a few questions that leave me unsure how to proceed. Here are the main questions I have:
I'm thinking of logging all of the events in the same MySQL database that the main web app holds its data. The concern I have is bloating the MySQL database into a massive DB. Also, if the DB crashes or is destroyed somehow (yes I have backups) I'll loose my log too which blows away any ability to track down the problem. Do I use a seperate DB or just go with text files?
How granular do I go? Initially I was thinking of simply logging things like, "Date - In view myView". However, as I'm thinking about it, it would be nice to log all the stuff that happens within the view. Doing this could make the log massive! and would also make my code ugly with so many log entry lines mixed into the code. This kind of detail:
Date - entered view myView
Date - in view myView, retrieved object myObject from the DB
Date - in view myView, setting myObject field myField to myNewValue
Date - leaving myView
Those are my main thoughts at this point. Any advice on this front?
Thanks
I think the best and right way is to create your own custom middleware where you can log actually everything you need.
Here are some links on the subject:
middleware snippets
http://djangosnippets.org/snippets/2624/
http://djangosnippets.org/snippets/290/
http://djangosnippets.org/snippets/264/
django-logging-middleware (pretty old by may give you an idea)
django-request
django.db.backends logging
Is there a Django middleware/plugin that logs all my requests in a organized fashion?
Django verbose request logging
log all sql queries
django orm, how to view (or log) the executed query?
Also, consider using sentry error logging and aggregation platform instead of writing logs into the database. FYI, see using a database for logging.
If you want to log any action run in every view, you can for example, replace entered view A and exited view A by a line in these words: view A - 147ms.
As alecxe stated, you can log requests/SQL, there are plenty of ways to do it with middleware. About database (object) actions, you can tie individual saves, updates and deletes with signals.
For bulk updates and deletes, you could (it's not a clean way but it would work) monkey-patch manager and queryset methods to add logging.
This way you can log actions rather than SQL.
I would see lines like this:
[2013/09/11 15:11:12.0153] view app.module.view 200 148ms
[2013/09/11 15:11:12.0189] orm save:auth.User,id=1 3ms
This is a quick and dirty proposal, but, maybe it's worth it.

Marking users as new when created via a backend's authenticate in Django

I have an authentication backend based off a legacy database. When someone logs in using that database and there isn't a corresponding User record, I create one. What I'm wondering is if there is some way to alert the Django system to this fact, so that for example I can redirect the brand-new user to a different page.
The only thing I can think of is adding a flag to the users' profile record called something like is_new which is tested once and then set to False as soon as they're redirected.
Basically, I'm wondering if someone else has figured this out so I don't have to reinvent the wheel.
I found the easiest way to accomplish this is to do exactly as you've said. I had a similar requirement on one of my projects. We needed to show a "Don't forget to update your profile" message to any new member until they had visit their profile for the first time. The client wanted it quickly so we added a 'visited_profile' field to the User's profile and defaulted that to False.
We settled on this because it was super fast to implement, didn't require tinkering with the registration process, worked with existing users, and didn't require extra queries every page load (since the user and user profile is retrieved on every page already). Took us all of 10 minutes to add the field, run the South migration and put an if tag into the template.
There's two methods that I know of to determine if an object has been created:
1) When using get_or_create a tuple is returned of the form (obj, created) where created is a boolean indicating obviously enough whether the object was created or not
2) The post_save signal passes a created paramater, also a boolean, also indicating whether the object was created or not.
At the simplest level, you can use either of these two hooks to set a session var, that you can then check and redirect accordingly.
If you can get by with it, you could also directly redirect either after calling get_or_create or in the post_save signal.
You can use a file-based cache to store the users that aren't yet saved to the database. When the user logs in for the second time, you can look in the cache, find the user object, and save it to the database for good.
Here's some info on django caching: http://docs.djangoproject.com/en/dev/topics/cache/?from=olddocs
PS: don't use Memcached because it will delete all information in the situation of a computer crash or shut down.

How can I track a user events in Django?

I'm building a small social site, and I want to implement an activity stream in a users profile to display event's like commenting, joining a group, posting something, and so on. Basically, I'm trying to make something similar to a reddit profile that shows a range of user activity.
I'm not sure how I'd do this in Django though. I thought of maybe making an "Activity" model that's OneToOne with their account, and update it through MiddleWare.
Anyone here have a suggestion? Away I could actually implement this in a nice way?
You pretty much need to use an explicit Activity model, then create instances of those records in the view functions that perform the action.
I think you'll find that any other more automatic way of tracking activity would be too inflexible: it would record events at the wrong level of detail, and prevent you from describing events in a way that the user wants to see them.
In my opinion, you should do exactly what you're saying, that is create the model Activity, which has a foreignKey to User which you will populate triggering the things you'll find 'interesting'.
This practice, even if redundant, will speed up your page generation, and you can add a custom field which will hold the text you want to display, and also you can keep track of what generate the Activity.