Multi-tenant Django applications using Mongoengine - django

I want to build a multi tenant architecture for a SAAS system. We are using Django as our backend and mongoengine as our main database and gunicorn as our web-server.
Our clients are a few big companies, so the number of databases pre-allocating space shouldn't be a problem.
The first approach we took was to write a middleware to determine the source of the request to properly connect to a mongoengine database. Here is the code:
class MongoConnectionMiddleware(object):
def process_request(self, request):
if request.user.is_authenticated():
mongo_connect(request.user.profile.establishment)
And the mongo_connect method:
def mongo_connect(establishment):
db_name = 'db_client_%d' % establishment.id
connect(db_name)
This will register the "default" alias as the db_name for every mongoengine request.
But it seems that when many concurrent users from different companies are making requests, each one sets the default db_name to it's own name.
As an example:
Company A makes a request and connects to database A. While A is making it's work company B connects to database B. This makes A also connect to B's database in the process, so A fails to find some ids.
¿Is there a way to isolate the connection to the mongo database per request to avoid this problem?

Unfortunately MongoEngine seems to be designed around a very basic use case of a single primary connection and multiple auxiliary connections.
http://docs.mongoengine.org/en/latest/guide/connecting.html#connecting-to-mongodb
To get around the default connection logic, I define the first connection I come across as the default, I also add it as a named connection. I then add any subsequent connection as named connections only.
https://github.com/MongoEngine/mongoengine/issues/607#issuecomment-38651532
You can use the with_db decorator to switch from one connection to another, but it's a contextmanager call, which means as soon as you leave the with statement, it will revert. It also still requires a default connection.
http://docs.mongoengine.org/en/latest/guide/connecting.html#switch-database-context-manager
You might be able to put it inside a function and then yield inside the with to prevent it reverting immediately, I'm not sure if this is valid.
You could use a wrapper of some kind, either a function, class or a custom QuerySet, that checks the current django/flask session and switches the db to the appropriate connection.
I'm not sure if a QuerySet can do this, but it would probably be the nicest way if it can.
http://docs.mongoengine.org/en/latest/guide/querying.html#custom-querysets
I included some code in this issue here where I change the database connection for my models.
https://github.com/MongoEngine/mongoengine/issues/605
def switch(model, db):
model._meta['db_alias'] = db
# must set _collection to none so it is re-evaluated
model._collection = None
return model
MyDocument = switch(MyDocument, 'db-alias')
You'll also want to take a look at the code that mongoengine uses to switch dbs.
Beware that mongo engine likes to cache things, so changing a few variables here and there doesn't always cause an effect. It's full of surprises like this.
Edit:
I should also add, that the 'connect' call won't pick up value changes. So calling connect with new parameters wont take effect unless its a new alias. Even the disconnect function (which isn't exposed publically) doesn't let you do this as the models will cache the connection. I mention this in some of the issues linked above and also here: https://github.com/MongoEngine/mongoengine/issues/566

Related

How to handle network errors when saving to Django Models

I have a Django .save() execution that loops at n times.
My concern is how to guard against network errors during saving, as some entries could be saved while others won't and there could be no telling.
What is the best way to make sure that the execution is completed?
Here's a sample of my code
# SAVE DEBIT ENTRIES
for i in range(len(debit_journals)):
# UPDATE JOURNAL RECORD
debit_journals[i].approval_no = journal_transaction_id
debit_journals[i].approval_status = 'approved'
debit_journals[i].save()
Either use bulk_create / bulk_update to execute a single DB query, or use transaction.atomic as decorator for your function so that any error on save will rollback your database before your function was run.
Try something like below (I suppose your model name is DebitJournal and debit_journals is a list).
for debit_journal in debit_journals:
debit_journal.approval_no = journal_transaction_id
debit_journal.approval_status = 'approved'
DebitJournal.objects.bulk_update(debit_journals, ["approval_no", "approval_status"])
If debit_journals is a QuerySet you can also try
debit_journals.update(approval_no=journal_transaction_id, approval_status='approved').
It depends of what you call a network error, if it's between the user and the django application or between the django application and the database. If it's only between the user and the app, note that if the request has been sent correctly even if the user lose the connection afterward the objects will be created. So a user might not have the request response, but objects will still be created.
If it's between the database and the django application some objects might still be created before the error.
Usually if you want a "All or Nothing" behaviour you should use manual transaction as described there: https://docs.djangoproject.com/en/4.1/topics/db/transactions/
Note that if the creation is really long you might hit the request timeout. If the creation takes more than a few seconds you should consider making it a background task. The request is only there to create the task.
See Python task queue alternatives and frameworks for 3rd party solutions.

In Django how can I influence the connection process to the DB?

What is the best method to change the way how django connects to the DB (i.e. MySQL)?
For instance what should I do if I needed django to connect to the db via ssh tunnel, whose settings may change dynamically? (I'm planning to use sshtunnel)
I understand I should sub-class the django.db.backends.mysql.base.DatabaseWrapper and probably super()/modify the get_new_connection(self, conn_params)? (see the example below)
But then how would I submit this custom class in the settings, as it seems that the settings expect path to the module, rather than the class?
Something among the lines:
class myDatabaseWrapper(DatabaseWrapper):
"""Oversimplified example."""
def get_new_connection(self, conn_params):
with open('path/to/json.js', 'rt') as file:
my_conn_params = json.load(file)
conn_params.update(my_conn_params)
return super().get_new_connection(my_conn_params)

How to override database settings in Django tests?

I'm on Django 1.8 (using pytest) and I have the following configuration:
A default and a readonly database managed by a MasterSlaveRouter that directs DB calls to one connection or the other depending on whether they're read or write operations.
In my development environment, both entries in the settings.DATABASES dictionary have the same setup (they just use a different connection, but the database is the same).
In my test environment, however, there's only a default database.
I have a post_save signal fired whenever a model Foo is saved.
I have an atomic operation (decorated with #transaction.atomic) that modifies a Foo instance and calls .save() on it twice. Since no custom using parameter is passed to the decorator, the transaction is only active on the default database.
The post_save callback creates a Bar record with a OneToOneField pointing to Foo, but only after checking whether a Bar record with this foo_id already exists (in order to avoid IntegrityError). This check is done by performing this query:
already_exists = Bar.filter(foo=instance).exists()
This is ok the first time the post_save callback is called. A Bar record is created and everything works fine. The second time, however, even though such a Bar instance was just created in the previous Foo save, since filtering is a read operation, it is performed using the readonly connection, and therefore already_exists ends up containing the value False and the creation of a new record is triggered, which eventually throws an IntegrityError because when the create operation is performed on the default connection, there is already a record with that foo_id.
I tried copying the DATABASES dictionary from dev_settings to test_settings, but this broke many tests. I then read about the override_settings decorator and thought it would be perfect for my situation. For my surprise, however, it didn't work. It seems that at some point, when the application is initiated, the DATABASES dictionary (the one only with default from the test_settings) is cached and then even though I change setting.DATABASES, the new value is simply not accessed anymore.
How can I properly override the database configuration for one specific test?
Hum... well if you are using only pytest, I think you'll need to cleanup your databases after tests.
Now, to override django settings, it's good to :
from django.test import override_settings
#override_settings(DATABASE_CONFIG=<new_config>)
def test_foo():
pass
You should try the pytest-django:
pytestmark = pytest.mark.django_db
#pytest.mark.django_db
def test_foo():
pass
When you run your tests, you can set the create-db param, to force py.test create a new database or if you want to reuse your db, you can set the reuse-db, like:
$ py.test --create-db
$ py.test --reuse-db
checkout:
Oficial docs

django-webtest with multiple test client

In django-webtest, every test TestCase subclass comes with self.app, which is an instance of webtest.TestApp, then I could make it login as user A by self.app.get('/',user='A').
However, if I want to test the behavior if for both user A and user B in a test, how should I do it?
It seems that self.app is just DjangoTestApp() with extra_environ passed in. Is it appropriate to just create another instance of it?
I haven't tried setting up another instance of DjangoTestApp as you suggest, but I have written complex tests where, after making requests as user A I have then switched to making requests as user B with no issue, in each case passing the user or username in when making the request, e.g. self.app.get('/', user'A') as you have already written.
The only part which did not work as expected was when making unauthenticated requests, e.g. self.app.get('/', user=None). This did not work as expected and instead continued to use the user from the request immediately prior to this one.
To reset the app state (which should allow you to emulate most workflows with several users in a sequential manner) you can run self.renew_app() which will refresh your app state, effectively logging the current user out.
To test simultaneous access by more than one user (your question does not specify exactly what you are trying to test) then setting up another instance of DjangoTestApp would seem to be worth exploring.

Kohana 3.1 Web Services Bootstrapping Based on Environment and Stored Like A Session

We are building a n-tiered style application in Kohana 3.1 which distributes JSONP powered widgets to our partners based on a partner_id.
Each partner needs to be able to call a widget and specify an environment parameter: test OR production with the initial call, which will be used to select the appropriate database.
We need our bootstrap to watch for $_REQUEST['environment'] variable and then to maintain the state of that variable whenever the partner makes a call to the widget service.
The problem is, that all requests in the application use Bootstrap.php, but many of the requests are internal - i.e. they do not come with a partner_id or environment variable. We tried to use sessions to store these, but as these are server-to-server GET/POST calls, it does not seem possible to store and recall the session id in a cookie on the server (this is browser-less GET).
Does anyone have any suggestions? We realise we could pass the environment variable with every single call internal or external, but this does not seem very robust.
We have a config file which stores partner settings (indexed by partner_id), such as the width and height of the widget and we thought about storing the partner's environment in here, but not all calls to the server would be made by a partner, so we would still need another way to trigger the environment for other calls and select the correct DB.
We also thought of storing a flat file for the partner which maintains the last requested environment, but again, as we have many internal requests after the initial one, we don't always have a knowledge (i.e. we don't usually care) which partner_id is used in the initial call.
Hope this makes sense...!
The solution would be to call the models and methods that are needed to 'do stuff' from a single controller, keeping the partner_id only in the controller and sending the requested data back once all of the 'do stuff' methods have been run, as per the MVC model.
i.e., request from partner -> route -> controller -> calls models etc -> passes back to controller -> returns view to partner
That allows the partner_id to be kept by the controller and only passed to whatever models require it to 'do stuff', keeping within the MVC framework.
If you've not kept within the confines of MVC, then things will obviously get more complex and you'll need to store the variable somewhere.