Shared global variables among apps - django

Suppose I have two apps: data and visual.
App data executes a database retrieval upon starting up. This thread here and this doc advises on how and where to place code that executes once on starting up. So, in app data:
#apps.py
from django.apps import AppConfig
global_df #want to declare a global variable to be shared across all apps here.
class DataConfig(AppConfig):
# ...
def ready(self):
from .models import MyModel
...
df = retrieve_db() #retrieve model instances from database
...
return df
In the code above, I am looking to execute ready() once on starting up and return df to a shared global variable (in this case, global_df). App visual should be able to access (through import, maybe) this global_df. But, any further modification of this global_df should be done only in app data.
This thread here advised to place any global variable in the app's __init__.py file. But it mentioned that this only works for environmental variable.
Two questions:
1 - How and where do I declare such a global variable?
2 - On starting up, how to pass the output of a function that executes only once to this global variable?
Some threads discuss Redis for caching purpose. But I am not looking for that solution as it seems like an overkill for the problem I am having.

Related

Gunicorn reflect changed code dynamically

I am developing a django web application where a user can modify the code of certain classes, in the application itself, through UI using ace editor (think of as gitlab/github where you can change code online). But these classes are ran by django and celery worker at some point.
Once code changes are saved, the changes are not picked by django due to gunicorn but works fine with celery because its different process. (running it locally using runserver works fine and changes are picked by both django and celery).
Is there a way to make gunicorn reflects the changes of certain directory that contain the classes without reloading the whole application? and if reloading is necessary, is there a way to reload gunicorn's workers one-by-one without having any downtime?
the gunicron command:
/usr/local/bin/gunicorn config.wsgi --bind 0.0.0.0:5000 --chdir=/app
The wsgi configuration file:
import os
import sys
from django.core.wsgi import get_wsgi_application
app_path = os.path.abspath(os.path.join(
os.path.dirname(os.path.abspath(__file__)), os.pardir))
sys.path.append(os.path.join(app_path, 'an_application'))
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings.production")
application = get_wsgi_application()
The reload option is "intended for development". There's no strong wording saying you shouldn't use it in production. The reason you shouldn't use it in production is because people make typos, change in one file, may need several other changes in others, etc etc. So, you can make your site inaccessible and then you don't have a working app to fix it again.
For a dev, that's no problem as you look at the logs/output in your shell and restart it. This is why #Krzysztof's suggestion is the best one. Push the code changes to your repo, make it go through the CI/CD and switch over the pod. If CI fails, then CD won't happen so you're good.
Of course, that's a scope far too large for a Q&A site.
Why not save the code in a separate text file or database and the relevant method can simply load the code dynamically as a string and execute it using exec()?
Let say you have a function function1 which can be edited by a user. When the user submits the changes, process the input (separate out the functions so that you know which function has what definition), and save them all individually, like function1, function2 etc., in a database or a text file as strings.
One you need to execute function1, just load its value that you saved and use exec to execute the code.
This way, you won't need to reload gunicorn since all workers will always fetch the updated function definition at run time!
Something in the lines of:
def function1_original():
# load function definition
f = open("function1.txt", "r")
# execute the string
exec(f.read()) # this will just load the function definition
function1() # this will execute the user defined function
So the user will define:
def function1():
# user defined code
# blah blah
...
I was able to solve this by changing the extension of the python scripts to anything but .py
Then I loaded these files using the following function:
from importlib import util
from immportlib.machinary import SourceFileLoader
def load_module(module_name, modele_path):
module_path = path.join(path.dirname(__file__), "/path/to/your/files{}.anyextension".format(module_name))
spec = util.spec_from_loader(module_name,
SourceFileLoader(module_name, module_path))
module = util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
In this case, they are not loaded by Gunicorn in RAM and I was able to apply the changes on fly without the need to apply eval or exec functiong.

Accessing the properties of the mongoengine instance in a Flask app

I've registered the flask-mongoengine extension in my Flask app and initialized it.
I now want to access its conn property as I want to do pure MongoDB queries too.
>> app.extensions
{'csrf': <flask_wtf.csrf.CSRFProtect object at 0x00000213B72AB940>,
'mail': <flask_mail._Mail object at 0x00000213B72ABCF8>,
'mongoengine': {
<flask_mongoengine.MongoEngine object at 0x00000213B72ABDD8>: {
'app': <Flask 'app'>,
'conn': MongoClient(host=['xxx'], document_class=dict, tz_aware=False, connect=True, ssl=True, replicaset='Cluster0-shard-0', authsource='admin', retrywrites=True, read_preference=Primary())
}
},
'rq2': <flask_rq2.app.RQ object at 0x00000213B5DE8940>,
'security': <flask_security.core._SecurityState object at 0x00000213B734EE10>
}
Is there a way to access the conn property other than the very convoluted (and error-prone):
>> list(app.extensions.get('mongoengine').values())[0].get('conn')
MongoClient(host=['xxx'], document_class=dict, tz_aware=False, connect=True, ssl=True, replicaset='Cluster0-shard-0', authsource='admin', retrywrites=True, read_preference=Primary())
Does flask-mongoengine have a method to access its properties?
The MongoEngine instance has a connection attribute, use that.
You don't need to use app.extensions; that's more of an internal data structure for extensions to keep track of their own state when they need to access this from the current app context.
In your own code, just keep a reference to the MongoEngine instance you created. The documentation uses:
db = MongoEngine(app)
or
db = MongoEngine()
# in an app factory, attach to an app with db.init_app(app)
and so you can use:
db.connection
Next, there is also a current_mongoengine_instance() utility function that essentially gives you the same object as what your code already achieved. Use it like this:
from flask_mongoengine import current_mongoengine_instance
current_mongoengine_instance().connection
As a side note: the way this extension uses app.extensions is ... over-engineered and redundant. The rationale in the source code is:
# Store objects in application instance so that multiple apps do not
# end up accessing the same objects.
but multiple apps already have separate app.extensions dictionaries. While this method does let you use multiple mongoengine connections, you still then can't use this data structure to distinguish between different connections with only the current app context. The implementation for current_mongoengine_instance() only further illustrates that the extension doesn't have a proper strategy for handling multiple connections. You just get the 'first' one, whatever that may mean in the context of unordered dictionaries. The Flask SQLAlchemy extension instead uses a single extension instance to manage multiple connections via a system called binds.
You can also reach the underlying pymongo.Collection from any of your Document class by
class MyDocument(Document):
name = StringField()
MyDocument(name='John').save()
coll = MyDocument._get_collection()
print(coll.find_one({'name': 'John'}))

Django: Signal/Method called after "AppConfig.ready()"

I have an AppConfig.ready() implementation which depends on the readiness of an other application.
Is there a signal or method (which I could implement) which gets called after all application ready() methods have been called?
I know that django processes the signals in the order of INSTALLED_APPS.
But I don't want to enforce a particular ordering of INSTALLED_APPS.
Example:
INSTALLED_APPS=[
'app_a',
'app_b',
...
]
How can "app_a" receive a signal (or method call) after "app_b" processed AppConfig.ready()?
(reordering INSTALLED_APPS is not a solution)
I'm afraid the answer is No. Populating the application registry happens in django.setup(). If you look at the source code, you will see that neither apps.registry.Apps.populate() nor django.setup() dispatch any signals upon completion.
Here are some ideas:
You could dispatch a custom signal yourself, but that would require that you do that in all entry points of your Django project, e.g. manage.py, wsgi.py and any scripts that use django.setup().
You could connect to request_started and disconnect when your handler is called.
If you are initializing some kind of property, you could defer that initialization until the first access.
If any of these approaches work for you obviously depends on what exactly you are trying to achieve.
So there is a VERY hackish way to accomplish what you might want...
Inside the django.apps.registry is the singleton apps which is used by Django to populate the applications. See setup in django.__init__.py.
The way that apps.populate works is it uses a non-reentrant (thread-based) locking mechanism to only allow apps.populate to happen in an idempotent, thread-safe manner.
The stripped down source for the Apps class which is what the singleton apps is instantiated from:
class Apps(object):
def __init__(self, installed_apps=()):
# Lock for thread-safe population.
self._lock = threading.Lock()
def populate(self, installed_apps=None):
if self.ready:
return
with self._lock:
if self.ready:
return
for app_config in self.get_app_configs():
app_config.ready()
self.ready = True
With this knowledge, you could create some threading.Thread's that await on some condition. These consumer threads will utilize threading.Condition to send cross-thread signals (which will enforce your ordering problem). Here is a mocked out example of how that would work:
import threading
from django.apps import apps, AppConfig
# here we are using the "apps._lock" to synchronize our threads, which
# is the dirty little trick that makes this work
foo_ready = threading.Condition(apps._lock)
class FooAppConfig(AppConfig):
name = "foo"
def ready(self):
t = threading.Thread(name='Foo.ready', target=self._ready_foo, args=(foo_ready,))
t.daemon = True
t.start()
def _ready_foo(self, foo_ready):
with foo_ready:
# setup foo
foo_ready.notifyAll() # let everyone else waiting continue
class BarAppConfig(AppConfig):
name = "bar"
def ready(self):
t = threading.Thread(name='Bar.ready', target=self._ready_bar, args=(foo_ready,))
t.daemon = True
t.start()
def _ready_bar(self, foo_ready):
with foo_ready:
foo_ready.wait() # wait until foo is ready
# setup bar
Again, this ONLY allows you to control the flow of the ready calls from the individual AppConfig's. This doesn't control the order models get loaded, etc.
But if your first assertion was true, you have an app.ready implementation that depends on another app being ready first, this should do the trick.
Reasoning:
Why Conditions? The reason this uses threading.Condition over threading.Event is two-fold. Firstly, conditions are wrapped in a locking layer. This means that you will continue to operate under controlled circumstances if the need arises (accessing shared resources, etc). Secondly, because of this tight level of control, staying inside the threading.Condition's context will allow you to chain the configurations in some desirable ordering. You can see how that might be done with the following snippet:
lock = threading.Lock()
foo_ready = threading.Condition(lock)
bar_ready = threading.Condition(lock)
baz_ready = threading.Condition(lock)
Why Deamonic Threads? The reason for this is, if your Django application were to die sometime between acquiring and releasing the lock in apps.populate, the background threads would continue to spin waiting for the lock to release. Setting them to daemon-mode will allow the process to exit cleanly without needing to .join those threads.
You can add a dummy app which only purpose is to fire a custom all_apps_are_ready signal (or method call on AppConfig).
Put this app at the end of INSTALLED_APPS.
If this app receives the AppConfig.ready() method call, you know that all other apps are ready.
An alternative solution:
Subclass AppConfig and send a signal at the end of ready. Use this subclass in all your apps. If you have a dependency on one being loaded, hook up to that signal/sender pair.
If you need more details, don't hesitate!
There are some subtleties to this method:
1) Where to put the signal definition (I suspect in manage.py would work, or you could even monkey-patch django.setup to ensure it gets called everywhere that it is). You could put in a core app that is always the first one in installed_apps or somewhere where django will always load it before any AppConfigs are loaded.
2) Where to register the signal receiver (you should be able to do this in AppConfig.__init__ or possibly just globally in that file).
See https://docs.djangoproject.com/en/dev/ref/applications/#how-applications-are-loaded
Therefore, the setup is as follow:
When django first starts up, register the signal.
at the end of every app_config.ready send the signal (with the AppConfig instance as the sender)
in AppConfigs that need to respond to the signal, register a receiver in __init__ with the appropriate sender.
Let me know how it goes!
If you need it to work for third-party apps, keep in mind that you can override the AppConfigs for these apps (convention is to place these in a directory called apps). Alternatively, you could monkey patch AppConfig

Lifetime and scope of class and instance variables in django

While there are quite a few questions and answers on here already about the life of different variables within python I am looking for how they translate into the django environment in terms of application scope and endpoint scopes. Here is a simple version of what I am making and I want to ensure that it will behave the way I am expecting it to
my_cache/models/GlobalCache.py:
# This class should be global to the entire application and only
# load when the server is started.
class GlobalCacheobject):
_cache = {}
#classmethod
def fetch(cls):
return cls._cache
#classmethod
def flush(cls):
cls._cache = {}
#classmethod
def load_cache(cls, files_to_load_data_from):
for file in files_to_load_from:
cls._cache[file] = <load file and process its data into an entry>
my_cache/models/InstanceCache.py:
from .GlobalCache import GlobalCache
# This class will contain a reference to the global cache and use it to look
# up entries.
class InstanceCache(object):
def __init__(self, name=None):
self._name = name
self._cache = GlobalCache.fetch()
def fetch_file_data(self, file_name):
cache_entry = self._cache.get(file_name, None)
if cache_entry is None:
raise EntryNotFoundException()
return ReadOnlyInterfaceObject(cache_entry)
The intent is to have GlobalCache have a cls._cache value that will persist as long as the server is running. Calling GlobalCache.flush() will drop its global reference to the data it was tracking and calling GlobalCache.load(files_to_load_from) will populate a new instance of its data from.
The InstanceCache object is then intended to hold a reference to the current version of the data and return read-only objects for the different data sets identified by their original file name.
From my testing this seems to work, though I do not really have the InstanceCache object per se. I can load the global cache, retrieve read only objects to it and then flush the global, load it with new data. The original read only objects still return the values they were originally loaded with, new requests will use the new data values.
What I want to confirm is that GlobalCache will exist as long as the server is running and only alter its data with direct calls to flush() and load_cache(). And that when I hit an endpoint and create an InstanceCache it will keep a reference to the original data only as long as it exists. When the execution on the end point is done I would expect it to go out of scope removing the reference to the global cache and if that was the last one, it goes away and only the new/current data is kept. If it matters I am running Python 2.7.6 and django 1.5.12. Solutions that require an upgrade may be useful as well but it is not an immediate option for me.
The answer here is a maybe, and it also depends a lot on which app server you are using to run django (if you are running multi-process).
So, generally speaking, yes, the GlobalCache will retain its cached contents for the lifetime of the process it is in after it has been initialized.
But InstanceCache, on the other hand, is only guaranteed to be garbage collected at some time after there are no more references to it. Garbage collection is a deep field and there are often teams of people that work on the algorithms so going into exact scenarios is probably outside the scope of an answer on SO. A popular implementation of python is pypy, and you can read more about the garbage collection used in pypy here.
That said, please remember that most app servers are multi-process. Both uwsgi and gunicorn spin up child processes to serve requests. So even though GlobalCache is a singleton in its process, there may be several processes, each with its own GlobalCache. And, this GlobalCache will ultimately be garbage collected/cleaned up when the process exits. Both uwsgi and gunicorn will usually kill child processes after the child services some number of HTTP requests.

Multiprogramming in Django, writing to the Database

Introduction
I have the following code which checks to see if a similar model exists in the database, and if it does not it creates the new model:
class BookProfile():
# ...
def save(self, *args, **kwargs):
uniqueConstraint = {'book_instance': self.book_instance, 'collection': self.collection}
# Test for other objects with identical values
profiles = BookProfile.objects.filter(Q(**uniqueConstraint) & ~Q(pk=self.pk))
# If none are found create the object, else fail.
if len(profiles) == 0:
super(BookProfile, self).save(*args, **kwargs)
else:
raise ValidationError('A Book Profile for that book instance in that collection already exists')
I first build my constraints, then search for a model with those values which I am enforcing must be unique Q(**uniqueConstraint). In addition I ensure that if the save method is updating and not inserting, that we do not find this object when looking for other similar objects ~Q(pk=self.pk).
I should mention that I ham implementing soft delete (with a modified objects manager which only shows non-deleted objects) which is why I must check for myself rather then relying on unique_together errors.
Problem
Right thats the introduction out of the way. My problem is that when multiple identical objects are saved in quick (or as near as simultaneous) succession, sometimes both get added even though the first being added should prevent the second.
I have tested the code in the shell and it succeeds every time I run it. Thus my assumption is if say we have two objects being added Object A and Object B. Object A runs its check upon save() being called. Then the process saving Object B gets some time on the processor. Object B runs that same test, but Object A has not yet been added so Object B is added to the database. Then Object A regains control of the processor, and has allready run its test, even though identical Object B is in the database, it adds it regardless.
My Thoughts
The reason I fear multiprogramming could be involved is that each Object A and Object is being added through an API save view, so a request to the view is made for each save, thus not a single request with multiple sequential saves on objects.
It might be the case that Apache is creating a process for each request, and thus causing the problems I think I am seeing. As you would expect, the problem only occurs sometimes, which is characteristic of multiprogramming or multiprocessing errors.
If this is the case, is there a way to make the test and set parts of the save() method a critical section, so that a process switch cannot happen between the test and the set?
Based on what you've described, it seems reasonable to assume that multiple Apache processes are a source of problems. Are you able to replicate if you limit Apache to a single worker process?
Maybe the suggestions in this thread will help: How to lock a critical section in Django?
An alternative approach could be utilizing a queue. You'd just stick your objects to be saved into the queue and have another process doing the actual save. That way you could guarantee that objects were processed sequentially. This wouldn't work well if your application depends on having the object saved by the time the response is returned unless you also had the request processes wait on the result (watching a finished queue for example).
Updated
You may find this info useful. Mr. Dumpleton does a much better job of laying out the considerations than I could attempt to summarize here:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines especially the Defining Process Groups section.
http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide Delegation to Daemon Process section
http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango
Find the section of text toward the bottom of the page that begins with:
Now, traditional wisdom in respect of
Django has been that it should
perferably only be used on single
threaded servers. This would mean for
Apache using the single threaded
'prefork' MPM on UNIX systems and
avoiding the multithreaded 'worker'
MPM.
and read until the end of the page.
I have found a solution that I think might work:
import threading
def save(self, *args, **kwargs):
lock = threading.Lock()
lock.acquire()
try:
# Test and Set Code
finally:
lock.release()
It doesn't seam to break the save method like that decorator and thus far I have not seen the error again.
Unless anyone can say that this is not a correct solution, I think this works.
Update
The accepted answer was the inspiration for this change.
I seams I was under the impressions that locks were some sort of special voodoo that were exempt by normal logic. Here the lock = threading.Lock() is run each time, thus instantiating a new unlocked lock which may always be acquired.
I needed a single central lock for the purpose, but were could that go unless I had a thread running all the time holding the lock? The answer seamed to be to use file locks explained in this answer to the StackOverflow question mentioned in the accepted answer.
The following is that solution modified to suit my situation:
The Code
Th following is my modified DjangoLock. I wished to keep locks relative to the Django root, to do this I put a custom variable into the settings.py file.
# locks.py
import os
import fcntl
from django.conf import settings
class DjangoLock:
def __init__(self, filename):
self.filename = os.path.join(settings.LOCK_DIR, filename)
self.handle = open(self.filename, 'w')
def acquire(self):
fcntl.flock(self.handle, fcntl.LOCK_EX)
def release(self):
fcntl.flock(self.handle, fcntl.LOCK_UN)
def __del__(self):
self.handle.close()
And now the additional LOCK_DIR settings variable:
# settings.py
import os
PATH = os.path.abspath(os.path.dirname(__file__))
# ...
LOCK_DIR = os.path.join(PATH, 'locks')
That will now put locks in a folder named locks relative to the root of the Django project. Just make sure you give apache write access, in my case:
sudo chown www-data locks
And finally the usage is much the same as before:
import locks
def save(self, *args, **kwargs):
lock = locks.DjangoLock('ClassName')
lock.acquire()
try:
# Test and Set Code
finally:
lock.release()
This is now the implementation I am using and it seams to be working really well. Thanks to all who have contributed to the process of arriving at this end.
You need to use synchronization on the save method. I haven't tried this yet, but here's a decorator that can be used to do so.