How to override database settings in Django tests? - django

I'm on Django 1.8 (using pytest) and I have the following configuration:
A default and a readonly database managed by a MasterSlaveRouter that directs DB calls to one connection or the other depending on whether they're read or write operations.
In my development environment, both entries in the settings.DATABASES dictionary have the same setup (they just use a different connection, but the database is the same).
In my test environment, however, there's only a default database.
I have a post_save signal fired whenever a model Foo is saved.
I have an atomic operation (decorated with #transaction.atomic) that modifies a Foo instance and calls .save() on it twice. Since no custom using parameter is passed to the decorator, the transaction is only active on the default database.
The post_save callback creates a Bar record with a OneToOneField pointing to Foo, but only after checking whether a Bar record with this foo_id already exists (in order to avoid IntegrityError). This check is done by performing this query:
already_exists = Bar.filter(foo=instance).exists()
This is ok the first time the post_save callback is called. A Bar record is created and everything works fine. The second time, however, even though such a Bar instance was just created in the previous Foo save, since filtering is a read operation, it is performed using the readonly connection, and therefore already_exists ends up containing the value False and the creation of a new record is triggered, which eventually throws an IntegrityError because when the create operation is performed on the default connection, there is already a record with that foo_id.
I tried copying the DATABASES dictionary from dev_settings to test_settings, but this broke many tests. I then read about the override_settings decorator and thought it would be perfect for my situation. For my surprise, however, it didn't work. It seems that at some point, when the application is initiated, the DATABASES dictionary (the one only with default from the test_settings) is cached and then even though I change setting.DATABASES, the new value is simply not accessed anymore.
How can I properly override the database configuration for one specific test?

Hum... well if you are using only pytest, I think you'll need to cleanup your databases after tests.
Now, to override django settings, it's good to :
from django.test import override_settings
#override_settings(DATABASE_CONFIG=<new_config>)
def test_foo():
pass
You should try the pytest-django:
pytestmark = pytest.mark.django_db
#pytest.mark.django_db
def test_foo():
pass
When you run your tests, you can set the create-db param, to force py.test create a new database or if you want to reuse your db, you can set the reuse-db, like:
$ py.test --create-db
$ py.test --reuse-db
checkout:
Oficial docs

Related

How to handle network errors when saving to Django Models

I have a Django .save() execution that loops at n times.
My concern is how to guard against network errors during saving, as some entries could be saved while others won't and there could be no telling.
What is the best way to make sure that the execution is completed?
Here's a sample of my code
# SAVE DEBIT ENTRIES
for i in range(len(debit_journals)):
# UPDATE JOURNAL RECORD
debit_journals[i].approval_no = journal_transaction_id
debit_journals[i].approval_status = 'approved'
debit_journals[i].save()
Either use bulk_create / bulk_update to execute a single DB query, or use transaction.atomic as decorator for your function so that any error on save will rollback your database before your function was run.
Try something like below (I suppose your model name is DebitJournal and debit_journals is a list).
for debit_journal in debit_journals:
debit_journal.approval_no = journal_transaction_id
debit_journal.approval_status = 'approved'
DebitJournal.objects.bulk_update(debit_journals, ["approval_no", "approval_status"])
If debit_journals is a QuerySet you can also try
debit_journals.update(approval_no=journal_transaction_id, approval_status='approved').
It depends of what you call a network error, if it's between the user and the django application or between the django application and the database. If it's only between the user and the app, note that if the request has been sent correctly even if the user lose the connection afterward the objects will be created. So a user might not have the request response, but objects will still be created.
If it's between the database and the django application some objects might still be created before the error.
Usually if you want a "All or Nothing" behaviour you should use manual transaction as described there: https://docs.djangoproject.com/en/4.1/topics/db/transactions/
Note that if the creation is really long you might hit the request timeout. If the creation takes more than a few seconds you should consider making it a background task. The request is only there to create the task.
See Python task queue alternatives and frameworks for 3rd party solutions.

In Django how can I influence the connection process to the DB?

What is the best method to change the way how django connects to the DB (i.e. MySQL)?
For instance what should I do if I needed django to connect to the db via ssh tunnel, whose settings may change dynamically? (I'm planning to use sshtunnel)
I understand I should sub-class the django.db.backends.mysql.base.DatabaseWrapper and probably super()/modify the get_new_connection(self, conn_params)? (see the example below)
But then how would I submit this custom class in the settings, as it seems that the settings expect path to the module, rather than the class?
Something among the lines:
class myDatabaseWrapper(DatabaseWrapper):
"""Oversimplified example."""
def get_new_connection(self, conn_params):
with open('path/to/json.js', 'rt') as file:
my_conn_params = json.load(file)
conn_params.update(my_conn_params)
return super().get_new_connection(my_conn_params)

Multi-tenant Django applications using Mongoengine

I want to build a multi tenant architecture for a SAAS system. We are using Django as our backend and mongoengine as our main database and gunicorn as our web-server.
Our clients are a few big companies, so the number of databases pre-allocating space shouldn't be a problem.
The first approach we took was to write a middleware to determine the source of the request to properly connect to a mongoengine database. Here is the code:
class MongoConnectionMiddleware(object):
def process_request(self, request):
if request.user.is_authenticated():
mongo_connect(request.user.profile.establishment)
And the mongo_connect method:
def mongo_connect(establishment):
db_name = 'db_client_%d' % establishment.id
connect(db_name)
This will register the "default" alias as the db_name for every mongoengine request.
But it seems that when many concurrent users from different companies are making requests, each one sets the default db_name to it's own name.
As an example:
Company A makes a request and connects to database A. While A is making it's work company B connects to database B. This makes A also connect to B's database in the process, so A fails to find some ids.
¿Is there a way to isolate the connection to the mongo database per request to avoid this problem?
Unfortunately MongoEngine seems to be designed around a very basic use case of a single primary connection and multiple auxiliary connections.
http://docs.mongoengine.org/en/latest/guide/connecting.html#connecting-to-mongodb
To get around the default connection logic, I define the first connection I come across as the default, I also add it as a named connection. I then add any subsequent connection as named connections only.
https://github.com/MongoEngine/mongoengine/issues/607#issuecomment-38651532
You can use the with_db decorator to switch from one connection to another, but it's a contextmanager call, which means as soon as you leave the with statement, it will revert. It also still requires a default connection.
http://docs.mongoengine.org/en/latest/guide/connecting.html#switch-database-context-manager
You might be able to put it inside a function and then yield inside the with to prevent it reverting immediately, I'm not sure if this is valid.
You could use a wrapper of some kind, either a function, class or a custom QuerySet, that checks the current django/flask session and switches the db to the appropriate connection.
I'm not sure if a QuerySet can do this, but it would probably be the nicest way if it can.
http://docs.mongoengine.org/en/latest/guide/querying.html#custom-querysets
I included some code in this issue here where I change the database connection for my models.
https://github.com/MongoEngine/mongoengine/issues/605
def switch(model, db):
model._meta['db_alias'] = db
# must set _collection to none so it is re-evaluated
model._collection = None
return model
MyDocument = switch(MyDocument, 'db-alias')
You'll also want to take a look at the code that mongoengine uses to switch dbs.
Beware that mongo engine likes to cache things, so changing a few variables here and there doesn't always cause an effect. It's full of surprises like this.
Edit:
I should also add, that the 'connect' call won't pick up value changes. So calling connect with new parameters wont take effect unless its a new alias. Even the disconnect function (which isn't exposed publically) doesn't let you do this as the models will cache the connection. I mention this in some of the issues linked above and also here: https://github.com/MongoEngine/mongoengine/issues/566

Is it dangerous to use dynamic database TEST_NAME in Django?

We are a small team of developers working on an application using the PostgreSQL database backend. Each of us has a separate working directory and virtualenv, but we share the same PostgreSQL database server, even Jenkins is on the same machine.
So I am trying to figure out a way to allow us to run tests on the same project in parallel without running into a database name collision. Furthermore, sometimes a Jenkins build would fail mid-way and the test database doesn't get dropped in the end, such that subsequent Jenkins build could get confused by the existing database and fail automatically.
What I've decided to try is this:
import os
from datetime import datetime
DATABASES = {
'default': {
# the usual lines ...
TEST_NAME: '{user}_{epoch_ts}_awesome_app'.format(
user=os.environ.get('USER', os.environ['LOGNAME']),
# This gives the number of seconds since the UNIX epoch
epoch_ts=int((datetime.utcnow() - datetime.utcfromtimestamp(0)).total_seconds())
),
# etc
}
}
So the test database name at each test run most probably will be unique, using the username and the timestamp. This way Jenkins can even run builds in parallel, I think.
It seems to work so far. But could it be dangerous? I'm guessing we're safe as long as we don't try to import the project settings module directly and only use django.conf.settings because that should be singleton-like and evaluated only once, right?
I'm doing something similar and have not run into any issue. The usual precautions should be followed:
Don't access settings directly.
Don't cause the values in settings to be evaluated in a module's top level. See the doc for details. For instance, don't do this:
from django.conf import settings
# This is bad because settings might still be in the process of being
# configured at this stage.
blah = settings.BLAH
def some_view():
# This is okay because by the time views are called by Django,
# the settings are supposed to be all configured.
blah = settings.BLAH
Don't do anything that accesses the database in a module's top level. The doc warns:
If your code attempts to access the database when its modules are compiled, this will occur before the test database is set up, with potentially unexpected results. For example, if you have a database query in module-level code and a real database exists, production data could pollute your tests. It is a bad idea to have such import-time database queries in your code anyway - rewrite your code so that it doesn’t do this.
Instead of the time, you could use the Jenkins executor number (available in the environment); that would be unique enough and you wouldn't have to worry about it changing.
As a bonus, you could then use --keepdb to avoid rebuilding the database from scratch each time... On the downside, failed and corrupted databases would have to be dropped separately (perhaps the settings.py can print out the database name that it's using, to facilitate manual dropping).

How do I run a unit test against the production database?

How do I run a unit test against the production database instead of the test database?
I have a bug that's seems to occur on my production server but not on my development computer.
I don't care if the database gets trashed.
Is it feasible to make a copy the database, or part of the database that causes the problem? If you keep a backup server, you might be able to copy the data from there instead (make sure you have another backup, in case you messed the backup database).
Basically, you don't want to mess with live data and you don't want to be left with no backup in case you mess something up (and you will!).
Use manage.py dumpdata > mydata.json to get a copy of the data from your database.
Go to your local machine, copy mydata.json to a subdirectory of your app called fixtures e.g. myapp/fixtures/mydata.json and do:
manage.py syncdb # Set up an empty database
manage.py loaddata mydata.json
Your local database will be populated with data and you can test away.
Make a copy the database... It's really a good practices!!
Just execute the test, instead call commit, call rollback at the end of.
The first thing to try should be manually executing the test code on the shell, on the production server.
python manage.py shell
If that doesn't work, you may need to dump the production data, copy it locally and use it as a fixture for the testcase you are using.
If there is a way to ask django to use the standard database without creating a new one, I think rather than creating a fixture, you can do a sqldump which will generally be a much smaller file.
Short answer: you don't.
Long answer: you don't, you make a copy of the production database and run it there
If you really don't care about trashing the db, then Marco's answer of rolling back the transaction is my preferred choice as well. You could also try NdbUnit but I personally don't think the extra baggage it brings is worth the gains.
How do you test the test db now? By test db do you mean SQLite?
HTH,
Berryl
I have both a full-on-slow-django-test-db suite and a crazy-fast-runs-against-production test suite built from a common test module. I use the production suite for sanity checking my changes during development and as a commit validation step on my development machine. The django suite module looks like this:
import django.test
import my_test_module
...
class MyTests(django.test.TestCase):
def test_XXX(self):
my_test_module.XXX(self)
The production test suite module uses bare unittest and looks like this:
import unittest
import my_test_module
class MyTests(unittest.TestCase):
def test_XXX(self):
my_test_module.XXX(self)
suite = unittest.TestLoader().loadTestsFromTestCase(MyTests)
unittest.TextTestRunner(verbosity=2).run(suite)
The test module looks like this:
def XXX(testcase):
testcase.assertEquals('foo', 'bar')
I run the bare unittest version like this, so my tests in either case have the django ORM available to them:
% python manage.py shell < run_unit_tests
where run_unit_tests consists of:
import path.to.production_module
The production module needs a slightly different setUp() and tearDown() from the django version, and you can put any required table cleaning in there. I also use the django test client in the common test module by mimicking the test client class:
class FakeDict(dict):
"""
class that wraps dict and provides a getlist member
used by the django view request unpacking code, used when
passing in a FakeRequest (see below), only needed for those
api entrypoints that have list parameters
"""
def getlist(self, name):
return [x for x in self.get(name)]
class FakeRequest(object):
"""
an object mimicing the django request object passed in to views
so we can test the api entrypoints from the developer unit test
framework
"""
user = get_test_user()
GET={}
POST={}
Here's an example of a test module function that tests via the client:
def XXX(testcase):
if getattr(testcase, 'client', None) is None:
req_dict = FakeDict()
else:
req_dict = {}
req_dict['param'] = 'value'
if getattr(testcase, 'client', None) is None:
fake_req = FakeRequest()
fake_req.POST = req_dict
resp = view_function_to_test(fake_req)
else:
resp = testcase.client.post('/path/to/function_to_test/', req_dict)
...
I've found this structure works really well, and the super-speedy production version of the suite is a major time-saver.
If you database supports template databases, use the production database as a template database. Ensure that you Django database user has sufficient permissions.
If you are using PostgreSQL, you can easily do this specifying the name of your production database as POSTGIS_TEMPLATE(and use the PostGIS backend).