Django session variables sometimes get lost in multi-threaded environment - django

I'm trying to cache a set of strings per session by storing each one in their own variable and by using django.contrib.session.
I have the following code:
import copy
def get_result(request, operation):
previous_result = request.session.get(operation.name)
if previous_result:
result = copy.deepcopy(previous_result)
else:
result = get_json_response(operation)
request.session[operation.name] = copy.deepcopy(result)
return result
get_result() is
triggered via ajax requests
used for many different operations which may be called at the same time
may be called multiple times per operation in one session
This code works perfectly fine on my local environment. However, in production server where gevent and chausette is installed, it fails.
Most of the time, request.session.get(operation.name) would return None even when it is not the first time that get_result is called for that operation. In some cases, it returns a value but in some, it doesn't. There seems to be no pattern on when it does and doesn't work.
I suspect that the inconsistency is because different threads are referencing the session variable at different states. What would be the proper way to handle session variables in this case?

I did in fact have the same problems and also tried to save the session properly with the tweaks you posted.
In the end, what solved my problem was changing the default cache in settings.py to
'BACKEND': 'django.core.cache.backends.dummy.DummyCache',
Using FileBasedCache instead helps as well, but it crashes in the local environment (development). Dummy works for local as well as production.

Related

How do I hide the GPU from one tf::Session() in a multi-session scenario?

I have a C++ program that makes use of two different classifiers running in separate tensorflow::Sessions. One of these models needs to run on the GPU, while the second is very small and I'd like for it to run on the CPU.
I tried creating the sessions using:
auto options = tensorflow::SessionOptions();
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0);
m_session.reset(tensorflow::NewSession(options));
but this seems only to trigger the default "hog the GPU" behaviour.
I also tried playing around with options.config.mutable_gpu_options()->set_visible_device_list("-1") but there is a specific check in the TF code for invalid devices in the list that raises an error.
Setting CUDA_VISIBLE_DEVICES=-1 in the environment before running the program is of course not an option because the GPU should be visible for the session that needs it.
Does anyone know how to forbid just one one session to use the GPU?
An easy workaround would be to temporarily set CUDA_VISIBLE_DEVICES=-1 using putenv("CUDA_VISIBLE_DEVICES=-1"); and reset it after creating the session.
std::string str_tmp = "CUDA_VISIBLE_DEVICES=";
str_tmp += getenv("CUDA_VISIBLE_DEVICES");
putenv("CUDA_VISIBLE_DEVICES=-1");
#Create session
#Reset environment variable
putenv(str_tmp);
However, there might be a cleaner way to do it without changing your environment variables.

Lifetime and scope of class and instance variables in django

While there are quite a few questions and answers on here already about the life of different variables within python I am looking for how they translate into the django environment in terms of application scope and endpoint scopes. Here is a simple version of what I am making and I want to ensure that it will behave the way I am expecting it to
my_cache/models/GlobalCache.py:
# This class should be global to the entire application and only
# load when the server is started.
class GlobalCacheobject):
_cache = {}
#classmethod
def fetch(cls):
return cls._cache
#classmethod
def flush(cls):
cls._cache = {}
#classmethod
def load_cache(cls, files_to_load_data_from):
for file in files_to_load_from:
cls._cache[file] = <load file and process its data into an entry>
my_cache/models/InstanceCache.py:
from .GlobalCache import GlobalCache
# This class will contain a reference to the global cache and use it to look
# up entries.
class InstanceCache(object):
def __init__(self, name=None):
self._name = name
self._cache = GlobalCache.fetch()
def fetch_file_data(self, file_name):
cache_entry = self._cache.get(file_name, None)
if cache_entry is None:
raise EntryNotFoundException()
return ReadOnlyInterfaceObject(cache_entry)
The intent is to have GlobalCache have a cls._cache value that will persist as long as the server is running. Calling GlobalCache.flush() will drop its global reference to the data it was tracking and calling GlobalCache.load(files_to_load_from) will populate a new instance of its data from.
The InstanceCache object is then intended to hold a reference to the current version of the data and return read-only objects for the different data sets identified by their original file name.
From my testing this seems to work, though I do not really have the InstanceCache object per se. I can load the global cache, retrieve read only objects to it and then flush the global, load it with new data. The original read only objects still return the values they were originally loaded with, new requests will use the new data values.
What I want to confirm is that GlobalCache will exist as long as the server is running and only alter its data with direct calls to flush() and load_cache(). And that when I hit an endpoint and create an InstanceCache it will keep a reference to the original data only as long as it exists. When the execution on the end point is done I would expect it to go out of scope removing the reference to the global cache and if that was the last one, it goes away and only the new/current data is kept. If it matters I am running Python 2.7.6 and django 1.5.12. Solutions that require an upgrade may be useful as well but it is not an immediate option for me.
The answer here is a maybe, and it also depends a lot on which app server you are using to run django (if you are running multi-process).
So, generally speaking, yes, the GlobalCache will retain its cached contents for the lifetime of the process it is in after it has been initialized.
But InstanceCache, on the other hand, is only guaranteed to be garbage collected at some time after there are no more references to it. Garbage collection is a deep field and there are often teams of people that work on the algorithms so going into exact scenarios is probably outside the scope of an answer on SO. A popular implementation of python is pypy, and you can read more about the garbage collection used in pypy here.
That said, please remember that most app servers are multi-process. Both uwsgi and gunicorn spin up child processes to serve requests. So even though GlobalCache is a singleton in its process, there may be several processes, each with its own GlobalCache. And, this GlobalCache will ultimately be garbage collected/cleaned up when the process exits. Both uwsgi and gunicorn will usually kill child processes after the child services some number of HTTP requests.

PyMongo find query returns empty/partial cursor when running in a Django+uWsgi project

We developed a REST API using Django & mongoDB (PyMongo driver). The problem is that, on some requests to the API endpoints, PyMongo cursor returns a partial response which contains less documents than it should (but it’s a completely valid JSON document).
Let me explain it with an example of one of our views:
def get_data(key):
return collection.find({'key': key}, limit=24)
def my_view(request):
key = request.POST.get('key')
query = get_data(key)
res = [app for app in query]
return JsonResponse({'list': res})
We're sure that there is more than 8000 documents matching the query, but in
some calls we get less than 24 results (even zero). The first problem we've
investigated was that we had more than one MongoClient definition in our code. By resolving this, the number of occurrences of the problem decreased, but we still had it in a lot of calls.
After all of these investigations, we've designed a test in which we made 16 asynchronous requests at the same time to the server. With this approach, we could reproduce the problem. On each of these 16 requests, 6-8 of them had partial results. After running this test we reduced uWsgi’s number of processes to 6 and restarted the server. All results were good but after applying another heavy load on the server, the problem began again. At this point, we restarted uwsgi service and again everything was OK. With this last experiment we have a clue now that when the uwsgi service starts running, everything is working correctly but after a period of time and heavy load, the server begins to return partial or empty results again.
The latest investigation we had was to run the API using python manage.py with DEBUG=False, and we had the problem again after a period of time in this situation.
We can't figure out what the problem is and how to solve it. One reason that we can think of is that Django closes pymongo’s connections before completion. Because the returned result is a valid JSON.
Our stack is:
nginx (with no cache enabled)
uWsgi
MemCached (disabled during debugging procedure)
Django (v1.8 on python 3)
PyMongo (v3.0.3)
Your help is really appreciated.
Update:
Mongo version:
db version v3.0.7
git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd
We are running single mongod instance. No sharding/replicating.
We are creating connection using this snippet:
con = MongoClient('localhost', 27017)
Update 2
Subject thread in Pymongo issue tracker.
Pymongo cursors are not thread safe elements. So using them like what I did in a multi-threaded environment will cause what I've described in question. On the other hand Python's list operations are mostly thread safe, and changing snippet like this will solve the problem:
def get_data(key):
return list(collection.find({'key': key}, limit=24))
def my_view(request):
key = request.POST.get('key')
query = get_data(key)
res = [app for app in query]
return JsonResponse({'list': res})
My very speculative guess is that you are reusing a cursor somewhere in your code. Make sure you are initializing your collection within the view stack itself, and not outside of it.
For example, as written, if you are doing something like:
import ...
import con
collection = con.documents
# blah blah code
def my_view(request):
key = request.POST.get('key')
query = collection.find({'key': key}, limit=24)
res = [app for app in query]
return JsonResponse({'list': res})
You could end us reusing a cursor. Better to do something like
import ...
import con
# blah blah code
def my_view(request):
collection = con.documents
key = request.POST.get('key')
query = collection.find({'key': key}, limit=24)
res = [app for app in query]
return JsonResponse({'list': res})
EDIT at asker's request for clarification:
The reason you need to define the collection within the view stack and not when the file loads is that the collection variable has a cursor, which is basically how the database and your application talk to each other. Cursors do things like keep track of where you are in a long list of data, in addition to a bunch of other stuff, but thats the important part.
When you create the collection cursor outside the view method, it will re-use the cursor on each request if it exists. So, if you make one request, and then another really, really fast right after that (like what happened when you applied high load), the cursor might only be half way through talking to the database, and so some of your data goes to the first request, and some to the second. The reason you would get NO data in a request would be if a cursor finished fetching data but hadn't been closed yet, so the next request tried to fetch data from the cursor, and there was none left to fetch in the query.
By moving the collection definition (and by association, the cursor definition) into the view stack, you will ALWAYS get a new cursor when you process a new request. You wont get any cross talking between your cursors and different requests, as each request cycle will have its own.

Flask.socket_io blocking calls when database queries are run

I am trying to use socket_io with my flask application. The problem is when i run database queries, like in the url_route function below. The first time the page loads properly but on consecutive calls the process goes into a blocking state. Even KeyboardInterrupt (Ctrl + c) terminates one of the python processes, i have to manually kill the other one.
One obvious solution would be to use a cache and use another script to run queries on database. Is there any other possible solution which could avoid running separate scripts?
#app.route('/status/<urlMap>')
def status(urlMap):
dictResponse = {}
data = models.Status.query.filter_by(urlmap = urlMap).first()
if data.conversion == "DONE":
dictResponse['conversion'] = 'success'
if data.published == "DONE":
dictResponse['publish'] = 'success'
return render_template('status.html',status = dictResponse)
Also on removing the import flask.ext.socketio and using app.run(host='0.0.0.0') instead of socketio.run(app,host='0.0.0.0') the app runs perfectly. So i think its the async gevent calls thats somehow blocking the process.
Like #Miguel pointed out the problem correctly. monkey patching the standard libraries solved the issue.
monkey.patch_all() solved the problem.

CF extended components suddenly stop working

We have a set of Coldfusion applications that all extended various parts of an application base. I'll provide a bit of code and then explain the issues we are having and see if anyone can shed light on the best way to trouble shoot this:
In our "OnRequestStart" in the app.cfc we have the following line to initiate a user:
if(!structKeyExists(SESSION, 'user'))
SESSION.user = CreateObject("component","cfcs.ds_user");
Then in the ds_user.cfc we call it like so:
component extends="cas.cas_user" displayname="basic_user"{
The application and all its parts run just like they should. However, in a seeming random manner after a while, the application will crash and I have to restart ColdFusion Service to get it running again. The error I get is:
Could not find the ColdFusion component or interface cas.cas_user.
So, for whatever reason after a while, my application decides it cannot find the path to the parent component. The mapping for that cfc is in the application.cfc at the top as so:
THIS.mappings["/cas"] = "#ReplaceNoCase(currpath,ListToArray(THIS.name,'_')[1],'cas30')#assets\cfcs\";
I want to be sure to say this, the application works perfectly as designed for a random amount of time and then it cannot find the parent component and will not find it again until I restart the ColdFusion Service on the server.
I figure this is somehow a memory leak or something, but I have no idea where to start looking to troubleshoot the issue. We have 6 or so other applications that are extended in the same way and work fine and never crash, but this one does.
EDIT: To be more clear on the mappings. Our applications are located:
root.com/app1
root.com/app2
We created mappings to grab cfcs from app2 while in app1 using the method above. The method, while I believe sort of strange, does work in all of our applications.
EDIT: The correct mappings that display for a while are:
/cfcs - D:\www\app1\assets\cfcs
/templates - D:\www\app1\assets\templates
/cas - D:\www\app2\assets\cfcs
/common - D:\www\app3\assets\common_elements
However once the Application goes in "crashed mode", the dump reveals the mappings are as follows:
/cfcs - D:\www\app1\assets\cfcs
/templates - D:\www\app1\assets\templates
/cas - D:\www\app1\assets\cfcs
/common - D:\www\app1\assets\common_elements
And here is how those mappings are defined at the start of the Application.cfc:
currpath = GetDirectoryFromPath(GetCurrentTemplatePath());
THIS.mappings["/templates"] = "#currpath#assets\templates";
THIS.mappings["/cfcs"] = "#currpath#assets\cfcs";
THIS.mappings["/common"] = "#ReplaceNoCase(currpath,ListToArray(THIS.name,'_')[1],'gum')#assets\common_elements\";
THIS.mappings["/cas"] = "#ReplaceNoCase(currpath,ListToArray(THIS.name,'_')[1],'cas30')#assets\cfcs\";
THIS.name = digisign_CAAAFACBFDFFE or
name_var = (arrayLen(meta_array) >= 2) ? meta_array[arrayLen(meta_array) - 1] & '_' : 'root_';
THIS.name = name_var & right(reReplace(hash(getCurrentTemplatePath()), "[^a-zA-Z]","","all"), 64 - len(name_var));
Where could it be failing. It seems the replace statement isn't working and therefore the appname in the path is not being changed from app1 to app2 when setting the mappings. is it possible this is related to this error we are currently working through: http://forums.adobe.com/message/4657868#4657868 We have yet to apply the Update 4 patch on production. However this problem we believe was happening before CF10. And while we have this issue, it only cropped up recently. This application in question has been crashing like this for a long time.
EDIT:
1. I guess when I say "crash" I mean the application gets into a state, where it will not declare the mappings correctly until I restart Coldfusion. I assume the error in our code causes the crash.
2.This is usually where the issue occurs, when doing this check of the SESSION.user var. I believe it has happened as well, it decides it cannot find our datasource. This is rare.
3. At first I thought yes, but actually no, not that many. Throughout our apps we have several names for common mappings. cas common cfcs templates etc. However D:\www\cas is where the application domain.com/cas30 is located. However a legacy version of that app is located at domain.com/cas. The mapping /cas should go to D:\www\cas30\assets\cfcs and works.
4.We have a dev setup and this never happens. (I assumed it was a load issue which is why it doesn't happen on dev). However, our dev environment is structred as so:
D:\www\deva\app1
D:\www\deva\app2
D:\www\devb\app1
D:\www\devb\app2
What we do (which I think is stupid) is we have a file located not in the same dir as the current app. This file is called application_base.cfc. All of the application.cfcs in the other applications extend from this application_base.cfc. They are not extended from other Application.cfc files. (hope that makes sense) In application_base is a init, onrequeststart, and an onerror. I'll post the App.cfc below. Also some setting are read from XML files both in the application base (to determine environment stuff) and at the application level. However we thought that might be causing the issue so the previous developer removed the xml file at the application level.
6.Yes. I'll post the app.cfc and the appbase.cfc so you can view both.
By reinitialize you mean call onapplicationstart or something. Not that I know of.
A few applications we have do:
currpath = GetDirectoryFromPath(GetCurrentTemplatePath());
app_path = ListToArray(currpath,'\');
THIS.name = app_path[ArrayLen(app_path)];
This one does:
meta_array = ListToArray(GetMetaData(this).name,'.');
name_var = (arrayLen(meta_array) >= 2) ? meta_array[arrayLen(meta_array) - 1] & '_' : 'root_';
THIS.name = name_var & right(reReplace(hash(getCurrentTemplatePath()), "[^a-zA-Z]","","all"), 64 - len(name_var));
A few others do this as well. Not sure if it was two different developers or something, but that is the way it is.
Once the app fails, it fails until I restart coldfusion. The app requires login from the domain.com/app page, so (not saying it cant change from request to request) but the request location is always the same where it's failing.
God I wish it wasn't this complex. I recently pulled our current CMS off of alot of this crazy stuff, but we have 7 or 8 applications that are so intertwined with each other and designed to work in dev/prod environments with different paths, its sometimes hard to tell what I can remove and what I can't.
I thought I tried dumping the applicationname from our error handler, but I thought it didn't work unless passed in. I passed through the mappings so I could see them which is how I know digisign is not changing to cas30 like it should in "crash" mode.
I think all the dynamic mappings were so the original developer could just use the same app.cfc template without changing anything. He liked to do stuff like var a = (b) ? (a-c) ? a-f+b : (a+b) ? d : d; : a; h; crap with no comments so it sometimes hard to just read the damn code let alone debug it.
EDIT
I feel like this issue and stackoverflow.com/q/14300915/1229594 issue may be related. I've posted some more details here as well: forums.adobe.com/message/5022377#5022377
First things first: why are you initialising session-oriented stuff in onREQUESTStart()? If you inited that in onSessionStart(), you'd not need to check for its existence every request, which - whilst trivial - is unnecessary overhead, and is simply the wrong code in the wrong place.
Secondly... you quote your error, but don't say where it's happening. Is it happening in that line in onRequestStart()?
If so, do me a favour: put a try/catch around it, and within that write the value of this.mappings to a log file, as well as the value of currPath. How is the value of that variable being derived, btw?
That said, I think if you just put that session.user init code in the right place, it'll solve your problem.
NB: frame this problem as almost certainly not a memory leak (ie: ColdFusion's fault), but your code doing something you did not anticipate (so... err... your fault ;-). This will help focus better on finding the problem. I'm not having a go at you, but "where is my code wrong" is a better approach than "it's probably something else". And more likely to be correct ;-)
Oh... and what version of CF are you on?
Take a look at this and see if it's relevant to your problem.
https://github.com/Mach-II/Mach-II-Framework/wiki/Application-Specific-Mapping-Workaround
If not, then it could have something to do with application specific mappings of the same name, on the same CF server, with those applications having different application names.
Some questions:
Are you assuming the crash is being caused by the code error, or that the code error is occurring because of the crash?
Is the instantiation of the session user the only line of code that you see these path errors?
Do you have any physical directories in your app that have the same name as the mapping names?
Does this occur in any other environments (dev/test)? Is this a clustered environment?
Are there multiple Application.cfc files extending this same Application.cfc?
Is there any code that is directly calling Application.cfc methods?
Are there any bits of code that cause the application to reinitialize itself?
What is determining the meta_array that is being used to derive the application name?
A few observations:
It seems to me that the application name is getting changed or that some other application is overwriting with the same name. This doesn't seem far-fetched as there's an awful lot of dynamic naming going on here. Starting with the application name, it's dependent on the current template's physical location, which could be different from request to request, depending on how the app routes requests. If the current template varies, the application name will vary, and cause the other app-specific mappings to change, which would cause a cascading effect to all the other mappings that use the app name to determine the physical location of those mappings.
Which begs the question: Why is all this dynamic evaluation of the application name and mapping locations even necessary? Can it be simplified or hard-coded? Can you instead use a server mapping? If it doesn't have to be this complex, simplifying it to its barest essentials will help troubleshooting and may clear up the issue entirely.
Finally, can you verify that the application name during normal operation is the same application name being referenced when the errors are occurring?
If they are different, then something is causing the application to execute within a different context (see my initial questions above for clues). A sudden change in the application name would invalidate any existing sessions and force the session user instantiation code to re-run. And because the user component paths are based in part on the application name, the paths may no longer be correct.
But if the application names are the same between normal operation and crash mode, then most likely the currpath variable is being affected by some part of the application being executed in a different physical path than expected. Since currpath is directly used in determining the rest of the mappings, that could certainly explain why an unexpected path could cause the component to go missing.
Because there are so many variables going into deriving these names, you would be well served to log those variables during normal operation and during crash mode. You'll want to see
GetCurrentTemplatePath()
GetDirectoryFromPath(GetCurrentTemplatePath())
THIS.name
meta_array
THIS.mappings
I suspect you'll find something significantly different in these variables when operating normally and when the crash/errors are occurring, and that difference should lead you closer to the answer.