Load data in memory during server startup in Django app - django

I am creating a Django app wherein I need to load and keep some data in memory when the server starts for quick access. To achieve this I am using Django's AppConfig class. The code looks something like this :
from django.core.cache import cache
class myAppConfig(AppConfig):
name = 'myapp'
def ready(self):
data = myModel.objects.values('A','B','C')
cache.set('mykey', data)
The problem is that this data in cache will expire after sometime. One alternative is to increase the TIMEOUT value. But I want this to be available in memory all the time. Is there some other configuration or approach which I can use to achieve this ?

If you are using Redis. It does provide persistent data capture.
Check it out in https://redis.io/topics/persistence
You have to set it up by Redis Configuration.
There will be two methods to keep the data persist in the cache after reboot - both of them will save the data in other formats in HDD. I will suggest you set up the AOF for your purpose since it will record every write operation. However, it will cause more space.

Related

Django settings.py

At my project login, some settings.py environment variables are loaded to enable some behaviors:
unit_id = settings.COMPANY
When another user logged in the system changes the value of this variable, through a function, it reflects in all other users who are already active:
settings.COMPANY = "coke"
in this case, all users will see "coke" in settings.COMPANY. I believed this would be in memory and would only apply to the user section in question, because I did not write in the physical file.
I wonder if this is how Django handles the settings.py environment variables: Does it propagate dynamically to all instances opened by all users?
This variable is accessed by context_processors.py, below:
def units(request):
unit_id = settings.COMPANY
You should not change settings at runtime.
This is (mainly) because Django doesn't know anything about it's runtime, so it's definitely possible to run multiple instances of the same Django installation. Changing a setting like this will not propagate it to any other processes.
I wonder if this is how Django handles the settings.py environment
variables: Does it propagate dynamically to all instances opened by
all users?
Django doesn't run an instance for every user. There is one or more (if you for example use something like gunicorn or if you use multiple servers with a load balancer.) processes that listen on a certain port.
To have some changeable setting, you could specify a default value, but you should store something like the active company in the database.

Connection issues implementing Django Custom Storage backend with Box Cloud Storage API

I am working on a legacy Django application that has a lot of third party dependencies, one of which is the storage backend for file uploads. I was recently tasked with replacing our legacy third party cloud storage vendor with a newer cloud storage vendor (Box).
The cloud storage is implemented as a custom storage backend and used as the "storage" parameter in FileFields in models throughout the app. I'm basically trying to figure out what exactly happens in storage if you have a FileField in a model and you create a ModelForm based on that model, then you call "save" on the form.
It seems that a lot of stuff is going on and some of it is causing connection problems with the cloud storage API.
I tried reading the Django source to follow it and got all the way down to where the model is deciding if it should do an update (by doing an "exists" check in storage) or an insert.
Once it decides to do an insert, I noticed a call to my cloud storage backend occurs to upload the file (presumably non blocking?) as the insert sql is being generated.
Somewhere in here, connections to the cloud storage begin to hang and become unresponsive. At least, all I see in logs is
INFO requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): upload.box.com"
and no further info or response.
Unlike the previous cloud storage, the new one issues JWT's per session instead of having a static auth token that you simply pass every time. If I do not use Django's ModelForm with its magical "save" method, but call methods directly on the models with the FileFields, I do not encounter the connection problem - I get responses just fine from the cloud storage API.
So, I'm thinking there must be some kind of concurrency issue when calling "save" on a form that affects a model with a FileField?? I'm a little stumped. The code is involved, so it is hard to copy here, but basically it comes down to:
class CustomStorage:
def __init__(self):
set up storage API client,
authenticate client instance, etc
def _save(self, name):
call storage API methods to upload file
** includes a retry loop with file renaming
algorithm to avoid name conflicts, as
cloud API does not allow duplicate file
names
def exists(self, name):
call storage API methods to determine if file name conflict exists
def _open(self, name):
call storage API methods to download file
custom_storage = CustomStorage()
class ExampleModel(models.Model):
name = models.CharField(maxlength=255)
file_ref = FPFileField(upload_to="uploads", storage=custom_storage) #because we also have a dependency on FilePicker, now called FileStack
class ExampleModelForm(ModelForm):
file_ref = CustomFilePickerField()
class Meta:
model = ExampleModel
fields = ('name', 'file_obj')
form = ExampleModelForm()
model = form.save() # --> connection problem with
# cloud storage API starts here
# if I were to call ExampleModel.objects.create(...),
# the storage upload process would work fine
Is there some gotcha I'm not aware of that Django experts would know about implementing custom storage backends for Django based on cloud storage APIs?
Turns out that the problem was our usage of a deprecated version of a FilePicker field for the uploaded document model field, combined with an old bug in a stale version of Requests. Occasionally, this field would return a Django file instance wrapped around a cStringIO.StringIO object instead of a vanilla file object. This ended up running into a bug in the Requests library which caused stalled responses when chunking for a multi-part POST when the payload is a StringIO instance. Because upgrading is not an option, I solved the issue by detecting if the the underlying Django FileField File is not a file object and re-wrapping it in a ContentFile instance and seek-ing to 0, in order to play nice with the older version of Requests. If anyone knows of a better alternative, by all means, please let me know.

How do I turn off django cumulus in my local_settings.py

I have taken over a project that uses django cumulus for cloud storage. On my development machine, some times I use a slow internet connection, and every time I save a change, django recompiles and tries to make a connection to the racksapace store
Starting new HTTPS connection (1): identity.api.rackspacecloud.com
This sometimes takes 15 seconds and is a real pain. I read a post where someone said they turned off cumulus for local development. I think this was done by setting
DEFAULT_FILE_STORAGE
but unfortunately the poster did not specify. If someone knows a simple setting I can put in my local settings to serve media and static files from my local machine and stop django trying to connect to my cloud storage on every save, that is what I want to do.
Yeah it looks like you should just need the DEFAULT_FILE_STORAGE to be default value, which is django.core.files.storage.FileSystemStorage according to the source code.
However, a better approach would be to not set anything in your local settings and set the DEFAULT_FILE_STORAGE and CUMULUS in a staging_settings.py or prod_settings.py file.
The constant reloading of the rackspace bucket was because the previous developer had
from cumulus.storage import SwiftclientStorage
class PrivateStorage(SwiftclientStorage):
and in models.py
from common.storage import PrivateStorage
PRIVATE_STORE = PrivateStorage()
...
class Upload(models.Model):
upload = models.FileField(storage=PRIVATE_STORE, upload_to=get_upload_path)
This meant every time the project reloaded, it would create a new https connection to rackspace, and time out if the connection was poor. I created a settings flag to control this by putting the import of SwiftclientStorage and defining of PrivateStorage like so
from django.conf import settings
if settings.USECUMULUS:
from cumulus.storage import SwiftclientStorage
class PrivateStorage(SwiftclientStorage):
...
else:
class PrivateStorage():
pass

Multi-tenant Django applications using Mongoengine

I want to build a multi tenant architecture for a SAAS system. We are using Django as our backend and mongoengine as our main database and gunicorn as our web-server.
Our clients are a few big companies, so the number of databases pre-allocating space shouldn't be a problem.
The first approach we took was to write a middleware to determine the source of the request to properly connect to a mongoengine database. Here is the code:
class MongoConnectionMiddleware(object):
def process_request(self, request):
if request.user.is_authenticated():
mongo_connect(request.user.profile.establishment)
And the mongo_connect method:
def mongo_connect(establishment):
db_name = 'db_client_%d' % establishment.id
connect(db_name)
This will register the "default" alias as the db_name for every mongoengine request.
But it seems that when many concurrent users from different companies are making requests, each one sets the default db_name to it's own name.
As an example:
Company A makes a request and connects to database A. While A is making it's work company B connects to database B. This makes A also connect to B's database in the process, so A fails to find some ids.
¿Is there a way to isolate the connection to the mongo database per request to avoid this problem?
Unfortunately MongoEngine seems to be designed around a very basic use case of a single primary connection and multiple auxiliary connections.
http://docs.mongoengine.org/en/latest/guide/connecting.html#connecting-to-mongodb
To get around the default connection logic, I define the first connection I come across as the default, I also add it as a named connection. I then add any subsequent connection as named connections only.
https://github.com/MongoEngine/mongoengine/issues/607#issuecomment-38651532
You can use the with_db decorator to switch from one connection to another, but it's a contextmanager call, which means as soon as you leave the with statement, it will revert. It also still requires a default connection.
http://docs.mongoengine.org/en/latest/guide/connecting.html#switch-database-context-manager
You might be able to put it inside a function and then yield inside the with to prevent it reverting immediately, I'm not sure if this is valid.
You could use a wrapper of some kind, either a function, class or a custom QuerySet, that checks the current django/flask session and switches the db to the appropriate connection.
I'm not sure if a QuerySet can do this, but it would probably be the nicest way if it can.
http://docs.mongoengine.org/en/latest/guide/querying.html#custom-querysets
I included some code in this issue here where I change the database connection for my models.
https://github.com/MongoEngine/mongoengine/issues/605
def switch(model, db):
model._meta['db_alias'] = db
# must set _collection to none so it is re-evaluated
model._collection = None
return model
MyDocument = switch(MyDocument, 'db-alias')
You'll also want to take a look at the code that mongoengine uses to switch dbs.
Beware that mongo engine likes to cache things, so changing a few variables here and there doesn't always cause an effect. It's full of surprises like this.
Edit:
I should also add, that the 'connect' call won't pick up value changes. So calling connect with new parameters wont take effect unless its a new alias. Even the disconnect function (which isn't exposed publically) doesn't let you do this as the models will cache the connection. I mention this in some of the issues linked above and also here: https://github.com/MongoEngine/mongoengine/issues/566

Updating a hit counter when an image is accessed in Django

I am working on doing some simple analytics on a Django webstite (v1.4.1). Seeing as this data will be gathered on pretty much every server request, I figured the right way to do this would be with a piece of custom middleware.
One important metric for the site is how often given images are accessed. Since each image is its own object, I thought about using django-hitcount, but figured that was unnecessary for what I was trying to do. If it proves easier, I may use it though.
The current conundrum I face is that I don't want to query the database and look for a given object for every HttpRequest that occurs. Instead, I would like to wait until a successful response (indicated by an HttpResponse.status of 200 or whatever), and then query the server and update a hit field for the corresponding image. The reason the only way to access the path of the image is in process_request, while the only way to access the status code is in process_response.
So, what do I do? Is it as simple as creating a class variable that can hold the path and then lookup the file once the response code of 200 is returned, or should I just use django-hitcount?
Thanks for your help
Set up a cron task to parse your Apache/Nginx/whatever access logs on a regular basis, perhaps with something like pylogsparser.
You could use memcache to store the counters and then periodically persist them to the database. There are risks that memcache will evict the value before it's been persisted but this could be acceptable to you.
This article provides more information and highlights a risk arising when using hosted memcache with keys distributed over multiple servers. http://bjk5.com/post/36567537399/dangers-of-using-memcache-counters-for-a-b-tests