Django pytest running many migrations after tests

Django pytest running many migrations after tests - django

I am running pytest-django with a legacy database that was not created by Django (e.g. all my models use managed=False). In production, everything is fine but in testing Django wants to apply a bunch of curious migrations.
For testing, I have a pre-populated test database and I want my tests to commit changes to the database (because we have logic in db views and triggers that needs to get run). All that is working fine but afterwards a ton of migrations are run and it takes my test suite time from 1s to 70s.
The migrations are overwhelmingly this type: ALTER TABLE [DeprecatedLegacyTable] CHECK CONSTRAINT [FK_dbo.DeprecatedLegacyTable_dbo.User_DeprecatedServiceId]. The tables aren't even in any models.py, so I guess Django is digging this up with inspectdb.
I've looked around a bit and it seems this is a "feature" of Django but it is hurting my workflow. Is there any way to to apply these migrations once and for all rather than replay them every test run? I've run makemigrations and showmigrations and there is nothing to apply.
EDIT:
I think that everything is related to TransactionTestCase. pytest-django actually warns that using transaction=True will be slow. Also, I don't think that these are migrations; it is the database flush procedure. The queries being run are the same as when I do django-admin sqlflush. So, I guess I am trying to override that flush behavior.
EDIT2:
What a ride. I see that Dj defers to the vendor database module for flush functionality, meaning each vendor can do it differently. I'm using mssql and they chose some questionable operations. Here's the part where they do the ALTER TABLE on every constraint:
COLUMNS = "TABLE_NAME, CONSTRAINT_NAME"
WHERE = "CONSTRAINT_TYPE not in ('PRIMARY KEY','UNIQUE')"
cursor.execute(
"SELECT {} FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS WHERE {}".format(COLUMNS, WHERE))
fks = cursor.fetchall()
sql_list = ['ALTER TABLE %s NOCHECK CONSTRAINT %s;' %
(self.quote_name(fk[0]), self.quote_name(fk[1])) for fk in fks]
In the end, I decided to try to monkeypatch the sql_flush functionality to return an empty list since I don't need any actual flushing done.
This is from a conftest.py:
#pytest.fixture(scope="session")
def django_db_setup():
# Turn the database flush procedure into a no-op
def mock_flush(*args, **kwargs):
return []
import django.core.management.sql
django.core.management.sql.sql_flush = mock_flush
settings.DATABASES["default"] = {
"ENGINE": "mssql",
"HOST": os.environ["SERVER_URL"],
"NAME": os.environ["TEST_DATABASE"],
}

Related

Skipping Django test database creation for read-only, externally managed, high-security, big databases

We need to use real data for one of our many external data sources during Django tests:
data is externally managed and read-only
data is accessed through manage.py inspectdb generated ORM classes
data is highly sensitive and we are not allowed to store fixtures of the actual data
tables are legacy design and will be phased out, hundreds of tables with complex relations, even getting a single record is complex
there is too much to do and I am unwilling to spend the time it would take to generate the fixtures, guarantee they're obfuscated, get approval of the obfuscation and justify keeping it around just to bridge us for a few months
We understand the downsides: This violates test purity and introduces a potential safety risk. We are willing to compromise on both to get us past the next few months when we will phase out this problem data source.
In this case, I need Django to understand that I don't want it to stand up a test database for this source, and just use the actual source so we can run some quick checks and walk away.
What is the simplest way to achieve this, with full understanding and acceptance of the risks and recommendations against?

For us, the solution was a custom test runner.
With help from Django's Advanced Testing Topics documentation, we overrode the default DiscoverRunner like this:
from django.test.runner import DiscoverRunner
def should_create_db(db_name):
# analyse db_name, a key from DATABASES, to determine whether a test
# database should be created
return db_name != 'messy_legacy_database'
class CustomTestRunner(DiscoverRunner):
# override method from superclass to selectively skip database setup
def setup_databases(self, **kwargs):
# 'aliases' is a set of unique keys from settings DATABASES dictionary
aliases = kwargs.get('aliases')
filtered = set([i for i in aliases if should_create_db(i)])
kwargs['aliases'] = filtered
# 'aliases' now contains only keys which trigger test database creation
return super().setup_databases(**kwargs)
# there was no need to override teardown_databases()
Next we update settings.py to use our override instead of the default runner:
TEST_RUNNER = 'path.to.CustomTestRunner'
Finally we tell our test class which databases it can use:
from django.test import TestCase
class OurTest(TestCase):
databases = [
'default',
'messy_legacy_database',
]
def test_messy_legacy_database(self):
# go nuts on your messy legacy database testing calls
pass
In this way our tests now skip test database creation for our messy legacy databases, and the logic we test pulls data from the actual data sources, allowing us to implement quick checks to ensure these code paths work.

Run migrations without loading views/urls

I have following code in one of my views:
#ratelimit(method='POST', rate=get_comment_rate())
def post_comment_ajax(request):
...
However, upon initial ./manage.py migrate, get_comment_rate() requires a table in database, so I'm unable to run the migrations to create the tables. I ended up with following error:
Django.db.utils.ProgrammingError: relation .. does not exist
Is it possible to run migrations without loading views or is there a better way?

Running migrations triggers the system checks to run, which causes the views to load. There isn't an option to disable this.
It looks like the ratelimit library allows you to pass a callable.
#ratelimit(method='POST', rate=get_comment_rate)
def post_comment_ajax(request):
This would call get_comment_rate when the view runs, rather than when the module loads. This could be an advantage (value won't be stale) or a disadvantage (running the SQL query every time the view runs could affect performance.
In general, you want to avoid database queries when modules load. As well as causing issues with migrations, it can cause issues when running tests -- queries can go to the live db before the test database has been created.
If you are ok with this risk, one option would be to catch the exception in the decorator:
def get_comment_rate():
try:
...
except ProgrammingError:
return '1/m' # or some other default

How to unittest a django database migration?

We've changed our database, using django migrations (django v1.7+).
The data that exists in the database is no longer valid.
Basically I want to test a migration by, inside a unittest, constructing the pre-migration database, adding some data, applying the migration, then confirming everything went smoothly.
How does one:
hold back the new migration when loading the unittest
I found some stuff about overriding settings.MIGRATION_MODULES but couldn't work out how to use it. When I inspect executor.loader.applied_migrations it still lists everything. The only way I could prevent the new migration was to actually remove the file; not a solution I can use.
create a record in the unittest database (using the old model)
If we can prevent the migration then this should be pretty straightforward. myModel.object.create(...)
apply the migration
I think I can probably work this out now that I've found the test_executor: set a plan pointing to the migration file and execute it? Um, right? Got any code for that :-D
confirm the old data in the database now matches the new model
Again, I expect this should be pretty easy: just fetch the instance created before the migration and confirm it has changed in all the right ways.
So the challenge is really just working out how to prevent the unittest from applying the latest migration script and then applying it when we're ready?
Perhaps I have the wrong approach? Should I create fixtures, and just confirm that they're all good at the end? Do fixtures get loaded before the migrations are applied, or after they're all done?
By using the MigrationExecutor and picking out specific migrations with .migrate I've been able to, maybe?, roll it back to a specific state, then roll forward one-by-one. But that is popping up doubts; currently chasing down sqlite fudging around due to the lack of an actual ALTER TABLE instruction. Jury still out.

I wasn't able to prevent the unittest from starting with the current database schema, but I did find it is quite easy to revert to earlier points in the migration history:
Where "0014_nulls_permitted" is a file in the migrations directory...
from django.db.migrations.executor import MigrationExecutor
executor.migrate([("workflow_engine", "0014_nulls_permitted")])
executor.loader.build_graph()
NB: running the executor.loader.build_graph between invocations of executor.migrate seems to be a very important part of completing the migration and making things behave as one might expect
The migrations which are currently applicable to the database can be checked with something like:
print [x[1] for x in sorted(executor.loader.applied_migrations)]
[u'0001_initial', u'0002_fix_foreignkeys', ... u'0014_nulls_permitted']
I created a model instance via the ORM then ensured the database was in the old state by running some SQL directly:
job = Job.objects.create(....)
from django.db import connection
cursor = connection.cursor()
cursor.execute('UPDATE workflow_engine_job SET next_job_state=NULL')
Great. Now I know I have a database in the old state, and can test the forwards migration. So where 0016_nulls_banished is a migration file:
executor.migrate([("workflow_engine", "0016_nulls_banished")])
executor.loader.build_graph()
Migration 0015 goes through the database converting all the NULL fields to a default value. Migration 0016 alters the schema. You can scatter some print statements around to confirm things are happening as you think they should be.
And now the test can confirm that the migration has worked. In this case by ensuring there are no nulls left in the database.
jobs = Job.objects.all()
self.assertTrue(all([j.next_job_state is not None for j in jobs]))

We have used the following code in settings_test.py to ignore the migration for the tests:
MIGRATION_MODULES = dict(
(app.split('.')[-1], '.'.join([app, 'nonexistent_django_migrations_module']))
for app in INSTALLED_APPS
)
The idea here being that none of the apps have a nonexistent_django_migrations_module folder, and thus django will simply find no migrations.

Django: Loaddata command after syncdb fails

I'm trying to use fixtures as a DB-agnostic way to get the data into my database, but this is much harder than it should be. I'm wondering what I'm doing wrong...
Specifically, when I do a syncdb followed by a migrate followed by a loaddata I run into trouble, since syncdb already creates data that loaddata tries to read from the dump. This leads to double entries and hence a crashing script.
This seems to be the same problem as described here: https://code.djangoproject.com/ticket/15926
But it's weird to me that this seems to be an ignored issue. Are fixtures not meant to actually put real (live) data in?
If so: is there any Django-format that is meant for this? Or is everyone just dumping data as SQL? And, if so, how would one migrate development data in SQLite to a production database?

syncdb will also load data from fixtures if you have the fixtures named correctly and in the correct location. See this link for more info.
https://docs.djangoproject.com/en/1.3/howto/initial-data/#automatically-loading-initial-data-fixtures
If you do not want the data to load on every syncdb then you will need to change the name of the fixture.
fixtures are an OK way to load your data, I have used it on a number of projects. On some projects when I have a ton of data I sometimes write a special load script that will take the data from my data source and load up my new django models, the custom script is a little more work, but gives you more flexibility.
I tend to stay away from using sql to load if I can, since SQL is usually DB specific, if you have to worry about loading on different database versions, stay away if you can.
"In general, using a fixture is a cleaner method since it’s database-agnostic, but initial SQL is also quite a bit more flexible."

OP here; this is what I came up with so far:
# some_app/management/commands/delete_all_objects.py
from django.core.management.base import BaseCommand, CommandError
from django.db.models import get_models
class Command(BaseCommand):
help = 'Deletes all objects'
def handle(self, *args, **options):
for model in get_models():
model.objects.all().delete()
And then just run delete_all_objects between after syncdb & migrate and before loaddata. I'm not sure I like it, I'm very surprised it's necessary, but it works.

Django 1.3 and South migrations

I have an existing project which extensively uses South migrations to load data into its tables.
Since upgrading to Django 1.3 our unit tests no longer run because they cannot find the data they rely on.
Is this behaviour is due to one of the backwards incompatible changes in 1.3
Is there an easy way for me to convert all these migrations into fixtures?

Yes, this behavior is due to this change.
There seems to be a workaround in South trunk (see https://bitbucket.org/andrewgodwin/south/changeset/21a635231327 ) so you can try South development version (it is quite stable in my experience).
You may try to change the DB name in settings (in order to get clean environment), run ./manage.py syncdb and ./manage.py migrate and then do ./manage.py dumpdata

I hit this issue today. Eventually I ended up refactoring my migrations so that they use helper functions to actually insert the data, and then calling the same functions from the setUp() of my tests.
Some hints;
Make your helper functions take the model class as an argument, so you can call them with orm['yourapp.YourModel'] from the migration and with models.YourModel from the test. That also shows the main limitation: South works for models whose schema has changed since then, the test code can't do that. I was lucky in that this particular model hasn't changed.
If you want to keep the helper methods inside the migrations, you'll find that you can't directly import yourapp.migrations.0001_some_migration because identifiers can't start with numbers. Use something like migration_0001 = importlib.import_module('yourapp.migrations.0001_some_migration') instead of an import statement.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js