Caching in Djangos object model - django

I'm running a system with a few workers that's taking jobs from a message queue, all using Djangos ORM.
In one case I'm actually passing a message along from one worker to another in another queue.
It works like this:
Worker1 in queue1 creates an object (MySQL INSERT) and pushes a message to queue2
Worker2 accepts the new message in queue2 and retrieves the object (MySQL SELECT), using Djangos objects.get(pk=object_id)
This works for the first message. But in the second message worker 2 always fails on that it can't find object with id object_id (with Django exception DoesNotExist).
This works seamlessly in my local setup with Django 1.2.3 and MySQL 5.1.66, the problem occurs only in my test environment which runs Django 1.3.1 and MySQL 5.5.29.
If I restart worker2 every time before worker1 pushes a message, it works fine. This makes me believe there's some kind of caching going on.
Is there any caching involved in Django's objects.get() that differs between these versions? If that's the case, can I clear it in some way?

The issue is likely related to the use of MySQL transactions. On the sender's site, the transaction must be committed to the database before notifying the receiver of an item to read. On the receiver's side, the transaction level used for a session must be set such that the new data becomes visible in the session after the sender's commit.
By default, MySQL uses the REPEATABLE READ isolation level. This poses problems where there are more than one process reading/writing to the database. One possible solution is to set the isolation level in the Django settings.py file using a DATABASES option like the following:
'OPTIONS': {'init_command': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED'},
Note however that changing the transaction isolation level may have other side effects, especially when using statement based replication.
The following links provide more useful information:
How do I force Django to ignore any caches and reload data?
Django ticket#13906

Related

Django: Is it possible to open a transaction, ask a user for confirmation, then commit the transaction if confirmed or roll back

I've read the docs: https://docs.djangoproject.com/en/3.0/topics/db/transactions/
But what's not clear to me, is whether a transaction can span HTTP requests.
The idea is simple enough:
user submits a form
backend then
opens a transaction
saves the data
presents user with a confirmation form
user then confirms or cancels
backend then commits on confirmation or rolls back on cancel
The main issue being that the transaction is opened on on HTTP request, then a user response is waited for (if never received I imagine on a time out we'd roll back) and when it comes on a second HTTP request, the transaction is committed.
I see nothing covering such a use case in the docs and have found nothing on-line. Yet it would strike me as a fairly ordinary use case. It arises primarily because the submission is complicated, involving many models and relations, and the easiest (almost only sensible or tenable) way to examine the submissions impact is to save all those and then study the impact. That works brilliantly as it happens, but I've thus far been force to make a commit or roll back decision in the one request, when processing the form. I'd like now to throw my analysis back at the user and ask for an OK, before I commit!
It strikes me to do this, the second request needs to know which transaction the confirmation relates to, and to determine if that transaction is open and then commit it or roll it back. This adds a whole tier of transaction identification that I can't see in the Django docs.
Database support:
Interesting Postgresql can support this as long as the whole transaction belongs to a single session (database connection), as I suspect other databases do. So this means it can only work if the save is performed by a persistent daemon that can start a transaction and stay running until the transaction is confirmed and committed or rolled back.
Which raises the ancillary question, of whether Django provides such a facility. I suspect not alas. I suspect that persistent workers are the domain of either uWsgi and/or Celery. And the persistent daemon that holds the database connection pending the confirmation request, is I suspect called a transaction manager.
And so this question really becomes in terser language: Is there an easy/canonical way of implementing a transaction manager for Django.
I am not aware of any library that will straightforwardly do like what you said. Although How I would go about this problem is after form submission I will store the variables in request session and perform necessay DB validation checks, upon which user can confirm and the transaction can execute.
In Django you can perform atomic transactions using transaction library
from django.db import transaction
Which allows you to either use
#transaction.atomic decorator or a context function in view
def viewfunc(request):
# This code executes in autocommit mode (Django's default).
do_stuff()
with transaction.atomic():
# This code executes inside a transaction.
do_more_stuff()
You can read more about it in Django docs
https://docs.djangoproject.com/en/3.0/topics/db/transactions/

Celery/Django transaction

Celery user guides suggests Django transaction be manually committed before calling task process.
http://celery.readthedocs.org/en/latest/userguide/tasks.html#database-transactions
I want the system to be as reliable as possible. What is the best practice to recover from a crash between transaction commit and calling task (i.e. make sure task is always called when transaction is committed).
BTW, right now I'm using database-based job queue I implemented so there is no such problem -- I can send jobs within transaction. I'm not really convinced if I should switch to Celery.
From django 1.9 this has been added
transaction.on_commit(lambda: add_task_to_the_queue())

How multiple invocation of same view is handled in django?

There is a view in the Django, for the submit button I say it: printSO
Now, the request is comming to view from two different browsers from the same machine, then how django is handling this?
Question:
Does it use any threading concept to invoke two different executions in parallel?
Considering the below scenario: pseudo code:
def results(request, emp_id):
# if emp_id exists in the database, then delete it.
# send response with message "deleted"
Do we need to have any synchronization mechanism in the above code?
The Django development server is single threaded and not suited for processing more than a request at the same time (I believe this is due to the GIL lock).
However, when combined with a different server , such as Apache, the later handles multithreading (in C). Here is some info (modwsgi) :
Modwsgi
To your final question: no, you don't need to sync anything in most cases
Since Django 1.4 the development server has been multi-threaded
See here
though it is still not a
production level webserver

Django Celery FIFO

So I have this 2 applications connected with a REST API (json messages). One written in Django and the other in Php. I have an exact database replica on both sides (using mysql).
When i press "submit" on one of them, i want that data to be saved on the current app database, and start a cron job with celery/redis to update the remote database for the other app using rest.
My question is, how do i attribute the same worker to my tasks in order to keep a FIFO order?
I need my data to be consistent and FIFO is really important.
Ok i am going to detail what i want to do a little further:
So i have this django app, and when i press submit after i fill in the form my celery worker wakes up and takes care of taking that submitted data and posting to a remote server. This i can do without problems.
Now, imagine that my internet goes down at that exact time, my celery worker keeps retrying to send until it is successful But imagine i do another submit before my previous data is submitted, my data wont be consistent on the other remote server.
Now that is my problem. I am not able to make this requests FIFO with the retry option given by celery so i that's were i need some help figuring that out.
this is the answer i got from another forum:
Use named queues with celery:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues
Start a worker process with a single worker:
http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html#starting-the-worker-process
Set this worker to consume from the appropriate queue:
http://docs.celeryproject.org/en/latest/userguide/workers.html#queues-adding-consumers
For the fifo part i can sort my celery broker in a fifo order before sending my requests

Django schedule task on timeout

I have a time-out set on an entity in my database, and a state (active/finished) assigned to it. What I want is to change that entity's state to finished when that time-out expires. I was thinking of using celery to create a scheduled task with that associated time-out on object creation, which in turn would trigger a django signal to notify that the object has 'expired' and after that I would set the value to finished in the signal handler. Still, this seems like a bit of an overhead, and I am thinking that there must be a more straight-forward way to do this.
Thank you in advance.
Not necessarily light-weight, but when I was faced with this problem I had two solutions.
For the first, I wrote a Django manager that would create a queryset of "to be expired" objects and then delete them. To make this lighter, I kept the "to be expired on event" objects in their own table with a one-to-one relationship to the actual objects, and deleted these events they're done to keep that table small. The relationship between the "to be expired" object and the object being marked "expired" only causes a database hit on the second table when you dereference the ForeignKey field, so it's fairly lightweight. I would then call that management call every 5 minutes with cron (the schedule manager for Unix, if you're not familiar with Unix). This was fine for an every-hour-or-so timeout.
For more close-to-the-second timeouts, my solution was to run a separate server that receives, via REST calls from the Django app, notices of timeouts. It keeps a sorted list of when timeouts were to occur, and then calls the aforementioned management call. It's basically a scheduler of its own with scheduled events being fed to it by the Django process. To make it cheap, I wrote it using Node.js.
Both of these worked. The cron job is far easier.
If the state is always active until it's expired and always finished afterwards, it would be simpler to just have a "finished" datetime field. Everything with a datetime in the past would be finished and everything in the future would be active. Unless there is some complexity going on that is not mentioned in your question, that should provide the functionality you want without any scheduling at all.
Example:
class TaskManager(models.Manager):
def finished(self):
return self.filter(finish__lte=datetime.datetime.now())
def active(self):
return self.filter(finish__gt=datetime.datetime.now())
class Task(models.Model):
finish = models.DateTimeField()
def is_finished(self):
return self.finish <= datetime.datetime.now()