App Engine / Django - Interleaving multiple requests interfering in GAE-Sessions

App Engine / Django - Interleaving multiple requests interfering in GAE-Sessions - django

I am running a python application on the App Engine using Django. Additionally, I am using a session-management library called gae-sessions. If threadsafe is set to "no", there is no problem, but when threadsafe is set to "yes", I occasionally see a problem with sessions being lost.
The issue that I am seeing is that when treading is enabled, multiple requests are ocassionally interleaved in GAE-Sessions middleware.
Within the gae-sessions library, there is a variable called _tls, which is a threading.local() variable. When a user makes an http request to the website, a function called process_request() is first run, followed by a bunch of custom html generation for the current page, and then a function called process_response() is run. State is remembered between the process_request and process_response in the _tls "thread safe" variable. I am able to check uniqueness of the _tls variable by printing out the _tls value (eg. "<thread._local object at 0xfc2e8de0>").
What I am occasionally witnessing is that on what appears to be a single thread in the GAE-Sessions middleware (inferred to be a single thread by the fact that they have the same memory location for the thread_local object, and inferred by the fact that data from one request appears to be overwriting data from another requst), multiple http requests are being interleaved. Given User1 and User2 that make a request at the same time, I have witnessed the following execution order:
User1 -> `process_request` is executed on thread A
User2 -> `process_request` is executed on thread A
User2 -> `process_response` is executed on thread A
User1 -> `process_response` is executed on thread A
Given the above scenario, the User2 session stomps on some internal variables and causes the session of User1 to be lost.
So, my question is the following:
1) Is this interleaving of different requests in the middleware expected behaviour in App-Engine/Django/Python? (or am I totally confused, and there is something else going on here)
2) At what level is this interleaving happening (App-Engine/Django/Python)?
I am quite surprised by seeing this behaviour, and so would be interested to understand why/what is happening here.

I found the following links to be helpful in understanding what is happening:
http://blog.notdot.net/2011/10/Migrating-to-Python-2-7-part-1-Threadsafe
Is Django middleware thread safe?
http://blog.roseman.org.uk/2010/02/01/middleware-post-processing-django-gotcha/
Assuming that I am understanding everything correctly, the reason that the above happened is the following:
1) When Django is running, it runs most of the base functionality in a parent (common) thread that includes the Django Middleware.
2) Individual requests are run in child threads which can interact with the parent thread.
The result of the above is that requests (child threads) can indeed be interleaved within the Middleware - and this is by design (only running a single copy of Django and the Middleware would save memory, be more efficient, etc.). [see the first article that I linked to in this answer for a quick description of how threading and child/parent processes interact]
With respect to GAE-Sessions - the thread that we were examining was the same for different requests, given that it was the parent thread (common for all children/requests), as opposed to the child threads that we were looking at each time that the middleware was entered.
GAE-Sessions was storing state data in the middleware, which could be over-written by different requests, given the possible interleaving of the child threads within the parent (Django + Middlware) thread. The fix that I applied to GAE-Sessions was to store all state data on the request object, as opposed to within the middlware.
Fixes: previously a writable reference to response handler functions was stored in the DjangoSessionMiddlware object as self.response_handlers - which I have moved to the request object as request.response_handlers. I also removed the _tls variable, and moved data that it contained into the request object.

Related

Making stored function based confirmation / callback system work with multiple processes in django + nginx

Our callback system worked such that during a request where you needed more user input you would run the following:
def view(req):
# do checks, maybe get a variable.
bar = req.bar()
def doit():
foo = req.user
do_the_things(foo, bar)
req.confirm(doit, "Are you sure you want to do it")
From this, the server would store the function object in a dictionary, with a UID as a key that would be send to the client, where a confirmation dialog would be shown. When OK is pressed another request is sent to another view which looks up the stored function object and runs it.
This works in a single process deployment. However with nginx, if there's a process pool greater than 1, a different process gets the confirmation request, and thus doesn't have the stored function, and can no run.
We've looked into ways to force nginx to use a certain process for certain requests, but haven't found a solution.
We've also looked into multiprocessing libraries and celery, however there doesn't seem to be a way to send a predefined function into another process.
Can anyone suggest a method that will allow us to store a function to run later when the request for continuing might come from a separate process?

There doesn't seem to be a good reason to use a callback defined as an inline function here.
The web is a stateless environment. You can never be certain of getting the same server process two requests in a row, and your code should never be written to store data in memory.
Instead you need to put data into a data store of some kind. In this case, the session is the ideal place; you can store the IDs there, then redirect the user to a view that pops that key from the session and runs the process on the relevant IDs. Again, no need for an inline function at all.

Realm accessed from incorrect thread on next App run

So we have an app were the user needs to login. During the login data were downloaded from the internet and created into the Realm database.
If the app were closed and reopened we want the app to retain the user that has been logged-in so they don't need to relogin again. Everything is fine and ok during the first user login. When the app were closed and reopen the Realm database throws an error "Accessed from incorrect thread"...
I can't provide much of the code as I don't know where the issue is. I would like to know if rerunning the app again is it on different thread than before? and if it's then the how data created from previous thread can be accessed in the new thread without encountering the said error?
Any help will be appreciated... Thanks in advance

As you have encountered, you can't access a realm from a different thread than the thread it was opened. It is possible however to open multiple instances of the same realm on different threads (or the same thread if that's needed). Opening a realm is not an expensive operation, so there's not a performance issue in opening realms.
I'm guessing in your case that you're downloading the data on a background thread. I'm also guessing the realm is first opened in the callback to that network request. That means the realm is opened on the thread that callback is on. If you try to access that realm on the main thread when reopening the app (or any other thread that's not the same thread as before) you'll get an error.
Best practice is to open a new realm every time you know your doing work on a different thread. As I mentioned, this is not an expensive operation and should be used liberally.
If you have some sort of RealmService or RealmManager as a singleton, I'd recommend against it. If the realm is initialised on the main thread, you won't be able to add records to it from a background thread.
In short: whenever you are doing operations on a realm in a callback, unless you are 100% certain you are going to be on the same thread as you opened a realm on, create a new realm instance and use that to do your operatations.

Django non-blocking save?

Is there's way to call save() on an model in django, without waiting for a response from the db?
You could consider this async, though I need less, as async calls usually gives you callback- which I dont need here.
So basically I want -
SomeModel.objects.bulk_create([list of objects ]) , every say 1000 objects,
Without this line blocking my code. I will have no use in these rows in my code.
I'm looking for something simple, package like celery seems to offer way more than this..

As of 2016, Django is a web framework working (for the moment, if we are ignoring channels) taking a HTTP request "as argument" and returns a HTTP response as soon as possible.
This architecture means there is no concept of asynchronous operation in the framework. If you want to delay saving and returns response to the user without waiting, you can:
either run another thread/async block (which can be tedious with database transactions...) ;
services like IronWorker that allows you to queue operations to run async a.s.a.p ;
celery, that may bring too much features for your case but will do a better than job than some homemade solution.

rq (Redis Queue) is another option for asynchronous operations (apart from those that Maxime Lorant mentions in his answer). It uses Redis as a broker (the middle man that holds the tasks) so if you are already using Redis or if you would like to add it to your project, you should consider it. It's a nice and simple solution, much simpler than celery. There is also django-rq a simple app that provides django integration for rq.
Update:
Summarizing comments
django_rq provides a management command (rqworker) that starts a worker process. Any job that is put in the queue will be executed by this process. You can either send one job to the queue for each object (a job would be a function with an object in its arguments and it will save the object in the database) or collect a list of objects and send a job with this list. In the second case you need to temporary store this list somewhere which might be tricky.
Using redis to temporary store the objects (Recommended)
I think that the most robust way to do it is to serialize objects to json and store them to a redis list. Then regularly check the length of it and when it has the desired length, you can send a job to the queue having this list in its arguments.
Using worker's memory to temporary store the objects
You could also use your worker's RAM as a temporary storage. This could be made since the worker process has its own memory. In this case the main process (the runserver) creates a job with an object. The job doesn't save the object, it just adds it to a list. You can keep appending objects to this list. Since the jobs are executed in the worker process, this list exists in the worker's memory. When it has the desirable length then you can save all objects.
But imagine the case in which you create more than one workers. In this case each job in the queue will be picked by the current free worker. So some objects will be appended in a list in the memory of worker_1, some other objects in the list of worker_2 etc. and you would have to deal with as many lists as workers.

beginner needs help with design/planning report-functionality in C++ Application

I'm somehow stuck with implementing a reporting functionailty in my Log-Parser Application.
This is what I did so far:
I'm writing an Application that reads Logfiles and searches the strings for multiple regular Expressions that can be defined in a user-configuration file. For every so called "StringPipe"-defintion that is parsed from the configuration the Main-Process spawns a worker thread that will search for a single regex. The more definitons the user creates, the more worker threads are spawned. The Main Function reads a bunch of Logstrings and then sends the workers to process the strings and so on.
Now I want every single worker thread that is spawned to report information about the number of matches it has found, how long it took, what it did with those strings and so on. These Information are used to export as csv, write to DB and so on.
Now I'm stuck at the point where I created a Class "Report". This Class provides member functions that are called by the worker threads to make the Report-Class gather the Infos needed for generating the report.
For that my workers (which are boost::threads / functors) have to create a Report-Object which they can call those reporting functions for.
The problem is in my Design, because when a worker-thread finishes his job, it is destroyed and for the next bunch of strings that needs to be processed a new instance of this worker functor is spawned and so it needs to create a new Report Object.
This is a problem from my understanding, because I need some kind of container where every worker can store it's reported infos into and finally a global report that contains such infos as how long the whole processing has taken, which worker was slowest and so on.
I just need to collect all these infos together, but how can I do this? Everytime a worker stops, reports, and then starts again, it will destroy the Report-Object and it's members, so all the infos from previous work is gone.
How can I solve this problem or how is such a thing handled in general?

First, I would not spawn a new thread do the RE searching and such. Rather, you almost certainly want a pool of threads to handle the jobs as they arise.
As far as retrieving and processing the results go, it sounds like what you want are Futures. The basic idea is that you create an object to hold the result of the computation, and a Future to keep track of when the computation is complete. You can either wait for the results to be complete, or register a call-back to be called when a future is complete.

Instead of having the worker thread create the report object, why don't you have the main thread create the empty report and pass a pointer to the worker thread when created. Then the worker thread can report back when it has completed the report, then the main thread can add the data from that report to some main report.
So, the worker thread will never have ownership of the actual report, it fill just populate its data fields and report back to the main thread.

Application Object and Concurrency Concerns

In some asp tutorials, like this, i observe the following pattern:
Application.Lock
'do some things with the application object
Application.Unlock
However, since web pages can have multiple instances, there is an obvious concurrency problem. So my questions are the following:
What if one page tries to lock while the object is already locked?
Is there a way to detect whether the application object is locked?
Is it better to just work on an unlocked application object or does that have other consequences?
What if there is only one action involving the application object? ~Is there a reason to lock/unlock in that case?

From the MSDN documentation:
The Lock method blocks other clients from modifying the variables stored in the Application object, ensuring that only one client at a time can alter or access the Application variables.
If you do not call the Application.Unlock method explicitly, the server unlocks the locked Application object when the .asp file ends or times out.
A lock on the Application object persists for a very short time because the application object is unlocked when the page completes processing or times out.
If one page locks the application object and a second page tries to do the same while the first page still has it locked, the second page will wait for the first to finish, or until the Server.ScriptTimeout limit is reached.
An example:
<%# Language="VBScript" %>
<%
Application.Lock
Application("PageCalls") = Application("PageCalls") + 1
Application("LastCall") = Now()
Application.Unlock
%>
This page has been called <%= Application("PageCalls") %> times.
In the example above, the Lock method prevents more than one client at a time from accessing the variable PageCalls. If the application had not been locked, two clients could simultaneously try to increment the variable PageCalls.

There will be consequences if you use the application object unlocked. For example if you want to implement a global counter:-
Application("myCounter") = Application("myCounter") + 1
The above code will at times miscount. This code reads, adds and assigns. If two threads try to perform this at the same time they may read the same value and then subsequently write the same value incrementing myCounter by 1 instead of 2.
Whats needed is to ensure that the second thread can't read myCounter until the second thread has written to it. Hence this is better:-
Application.Lock
Application("myCounter") = Application("myCounter") + 1
Application.Unlock
Of course there are concurrency issues if the lock is held for a long time especially if there are other uses for application which are unaffected by the code holding the lock.
Hence you should avoid a design that would require a long lock on the application.

If one page tries to lock the Application object while it is already locked, it will wait until the page holding the lock has released it. This will normally be quick (ASP code should only generally hold the lock for long enough to access the shared object that's stored in Application).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js