How to handle network errors when saving to Django Models - django

I have a Django .save() execution that loops at n times.
My concern is how to guard against network errors during saving, as some entries could be saved while others won't and there could be no telling.
What is the best way to make sure that the execution is completed?
Here's a sample of my code
# SAVE DEBIT ENTRIES
for i in range(len(debit_journals)):
# UPDATE JOURNAL RECORD
debit_journals[i].approval_no = journal_transaction_id
debit_journals[i].approval_status = 'approved'
debit_journals[i].save()

Either use bulk_create / bulk_update to execute a single DB query, or use transaction.atomic as decorator for your function so that any error on save will rollback your database before your function was run.
Try something like below (I suppose your model name is DebitJournal and debit_journals is a list).
for debit_journal in debit_journals:
debit_journal.approval_no = journal_transaction_id
debit_journal.approval_status = 'approved'
DebitJournal.objects.bulk_update(debit_journals, ["approval_no", "approval_status"])
If debit_journals is a QuerySet you can also try
debit_journals.update(approval_no=journal_transaction_id, approval_status='approved').

It depends of what you call a network error, if it's between the user and the django application or between the django application and the database. If it's only between the user and the app, note that if the request has been sent correctly even if the user lose the connection afterward the objects will be created. So a user might not have the request response, but objects will still be created.
If it's between the database and the django application some objects might still be created before the error.
Usually if you want a "All or Nothing" behaviour you should use manual transaction as described there: https://docs.djangoproject.com/en/4.1/topics/db/transactions/
Note that if the creation is really long you might hit the request timeout. If the creation takes more than a few seconds you should consider making it a background task. The request is only there to create the task.
See Python task queue alternatives and frameworks for 3rd party solutions.

Related

Django ORM returning stale data, possible race condition

Consider a Django application with a single RESTful API that creates objects (using Django REST Framework). As part of this API, I do some validation to make sure the creation calls are idempotent, such that if you call the creation API twice, the first will succeed, and the second will fail with a custom error code.
I have a scenario for testing this API which intermittently fails in the following way:
First API call, success, returns 201 -> object has supposedly been created
Immediately after response, second API call is made
Validation logic calls MyModel.objects.get(some_field=some_value) to check if this is a duplicate call or not
No such object is found, despite being created in step 1, thus a duplicate object is created
When inspecting the admin/querying the model, both objects can be seen.
Some more data:
There is no explicit caching on this model, or any other caching involved in this process.
I am unable to reproduce this locally
on my deployment setup there is about a 5% failure rate for this possible race condition.
Both local and deployment use PostgreSQL.
Deployment environment does have general caching enabled, but when enabling cache locally still no repro.
What might be causing this race condition? Does Django ORM have any failure modes where I might be getting stale data? Is there any way I can defensively protect the validation from getting stale data?
Have a look to transaction.atomic:
https://docs.djangoproject.com/en/1.8/topics/db/transactions/#django.db.transaction.atomic
this sometimes solves such issue
My current proposed solution is based on the feedback from #CraigRinger which seems to be true. Basically, to get a consistent response from Postgres I need to actually attempt an INSERT, and not just query for the data, because there are race conditions in play.
A partial reference to this can be found in https://code.djangoproject.com/ticket/20429#comment:22
Bottom line, the solution is to add a DB-enforced unique=True constraint on the relevant field on the model (some_field in this case), attempt the object creation, catch the IntegrityError, and from there on I can implement the custom error handling and propagate the right result to the API layer.

Multi-tenant Django applications using Mongoengine

I want to build a multi tenant architecture for a SAAS system. We are using Django as our backend and mongoengine as our main database and gunicorn as our web-server.
Our clients are a few big companies, so the number of databases pre-allocating space shouldn't be a problem.
The first approach we took was to write a middleware to determine the source of the request to properly connect to a mongoengine database. Here is the code:
class MongoConnectionMiddleware(object):
def process_request(self, request):
if request.user.is_authenticated():
mongo_connect(request.user.profile.establishment)
And the mongo_connect method:
def mongo_connect(establishment):
db_name = 'db_client_%d' % establishment.id
connect(db_name)
This will register the "default" alias as the db_name for every mongoengine request.
But it seems that when many concurrent users from different companies are making requests, each one sets the default db_name to it's own name.
As an example:
Company A makes a request and connects to database A. While A is making it's work company B connects to database B. This makes A also connect to B's database in the process, so A fails to find some ids.
¿Is there a way to isolate the connection to the mongo database per request to avoid this problem?
Unfortunately MongoEngine seems to be designed around a very basic use case of a single primary connection and multiple auxiliary connections.
http://docs.mongoengine.org/en/latest/guide/connecting.html#connecting-to-mongodb
To get around the default connection logic, I define the first connection I come across as the default, I also add it as a named connection. I then add any subsequent connection as named connections only.
https://github.com/MongoEngine/mongoengine/issues/607#issuecomment-38651532
You can use the with_db decorator to switch from one connection to another, but it's a contextmanager call, which means as soon as you leave the with statement, it will revert. It also still requires a default connection.
http://docs.mongoengine.org/en/latest/guide/connecting.html#switch-database-context-manager
You might be able to put it inside a function and then yield inside the with to prevent it reverting immediately, I'm not sure if this is valid.
You could use a wrapper of some kind, either a function, class or a custom QuerySet, that checks the current django/flask session and switches the db to the appropriate connection.
I'm not sure if a QuerySet can do this, but it would probably be the nicest way if it can.
http://docs.mongoengine.org/en/latest/guide/querying.html#custom-querysets
I included some code in this issue here where I change the database connection for my models.
https://github.com/MongoEngine/mongoengine/issues/605
def switch(model, db):
model._meta['db_alias'] = db
# must set _collection to none so it is re-evaluated
model._collection = None
return model
MyDocument = switch(MyDocument, 'db-alias')
You'll also want to take a look at the code that mongoengine uses to switch dbs.
Beware that mongo engine likes to cache things, so changing a few variables here and there doesn't always cause an effect. It's full of surprises like this.
Edit:
I should also add, that the 'connect' call won't pick up value changes. So calling connect with new parameters wont take effect unless its a new alias. Even the disconnect function (which isn't exposed publically) doesn't let you do this as the models will cache the connection. I mention this in some of the issues linked above and also here: https://github.com/MongoEngine/mongoengine/issues/566

how do i update my db data(mongodb) after fixed interval of time using django?

My User collection contains data such as
{"user1":"zera",
"my_status":"active",
"date_creation" : ISODate("2013-10-01T10:15:52.055Z")
}
{"user2":"dfgf",
"my_status":"noactive",
"date_creation": ISODate("2013-10-01T08:55:41.212Z")
}
I need to find each user with my_status :"active" and update their my_status after 24 hours from each user's date_creation.
Can anyone suggest a method to do it using django?
Well, I'd write an async task to keep polling the database to check for users with active status. If the user is active, update their status.
For the asynchronous tasks, you can use python-rq but to make things easier there's a django module for python-rq, it's django-rq. Also, Celery is another popular and good option. There's also a module for Django, you can find it here.

django-webtest with multiple test client

In django-webtest, every test TestCase subclass comes with self.app, which is an instance of webtest.TestApp, then I could make it login as user A by self.app.get('/',user='A').
However, if I want to test the behavior if for both user A and user B in a test, how should I do it?
It seems that self.app is just DjangoTestApp() with extra_environ passed in. Is it appropriate to just create another instance of it?
I haven't tried setting up another instance of DjangoTestApp as you suggest, but I have written complex tests where, after making requests as user A I have then switched to making requests as user B with no issue, in each case passing the user or username in when making the request, e.g. self.app.get('/', user'A') as you have already written.
The only part which did not work as expected was when making unauthenticated requests, e.g. self.app.get('/', user=None). This did not work as expected and instead continued to use the user from the request immediately prior to this one.
To reset the app state (which should allow you to emulate most workflows with several users in a sequential manner) you can run self.renew_app() which will refresh your app state, effectively logging the current user out.
To test simultaneous access by more than one user (your question does not specify exactly what you are trying to test) then setting up another instance of DjangoTestApp would seem to be worth exploring.

Updating a hit counter when an image is accessed in Django

I am working on doing some simple analytics on a Django webstite (v1.4.1). Seeing as this data will be gathered on pretty much every server request, I figured the right way to do this would be with a piece of custom middleware.
One important metric for the site is how often given images are accessed. Since each image is its own object, I thought about using django-hitcount, but figured that was unnecessary for what I was trying to do. If it proves easier, I may use it though.
The current conundrum I face is that I don't want to query the database and look for a given object for every HttpRequest that occurs. Instead, I would like to wait until a successful response (indicated by an HttpResponse.status of 200 or whatever), and then query the server and update a hit field for the corresponding image. The reason the only way to access the path of the image is in process_request, while the only way to access the status code is in process_response.
So, what do I do? Is it as simple as creating a class variable that can hold the path and then lookup the file once the response code of 200 is returned, or should I just use django-hitcount?
Thanks for your help
Set up a cron task to parse your Apache/Nginx/whatever access logs on a regular basis, perhaps with something like pylogsparser.
You could use memcache to store the counters and then periodically persist them to the database. There are risks that memcache will evict the value before it's been persisted but this could be acceptable to you.
This article provides more information and highlights a risk arising when using hosted memcache with keys distributed over multiple servers. http://bjk5.com/post/36567537399/dangers-of-using-memcache-counters-for-a-b-tests