backing up django database with dumpdata - django

For testing, I would like to be able to save and restore app state. This would seem to be a very common requirement!
I find that I have to do
python manage.py dumpdata --exclude=contenttypes --exclude=auth > sitedata.json
in order for loaddata (after flush) not to complain about uniqueness violations and such.
At present this is just a magic incantation for me that I found in online searches. I don't find the explanations comprehensible.
I would like to know: first, why I have to exclude auth; second, what contenttypes even is, as well as why I have to exclude it. My concern is not that I can't do what I need to do now, but that I don't understand it and wonder if there are other corners of this procedure waiting to bite me.
Thanks for any information or links.

I had not so positive experience using dumpdata for database backup, and it wasn't designed for that apparently. I ended up writing my own management command that calls PostgreSQL pg_dump command. I would recommend using your database dump function directly.

You exclude auth - because there might be some sensitive data, like user's email or password - so there is no real need to backup that data for testing purposes - tester can create its own user (more to that - he cant even now what is the password for your backuped user - password is hashed).
"contenttypes" - this is a table, created by included into Django Content Type Framework.
Instances of ContentType represent and store information about the
models installed in your project, and new instances of ContentType are
automatically created whenever new models are installed.
It is created automatically and stores, so to say "types" for all models, translated into simple numbers. This is for advanced usage. I don't know exactly why it is recommended to omit while making dump data. May be that means that this table is used not in all projects or something else - so there is no real need to backup it, may be to reduce the backup file size. Or may be when tester make migrations on his computer - contenttypes table will already be created, so there is no need to pass that data.

Related

Making backup of db skiping part of data in django [duplicate]

Is it possible to selectively filter which records Django's dumpdata management command outputs? I have a few models, each with millions of rows, and I only want to dump records in one model fitting a specific criteria, as well as all foreign-key linked records referencing any of those records.
Consider this use-case. Say I had a production database where my User model has millions of records. I have several other models (Log, Transaction, Purchase, Bookmarks, etc) all referencing the User model. I want to do development on my Django app, and I want to test using realistic data. However, my production database is so enormous, I can't realistically take a snapshot of the entire thing and load it locally. So ideally, I'd want to use dumpdata to dump 50 random User records, and all related records to JSON, and use that to populate a development database.
Is there an easy way to accomplish this?
I think django-fixture-magic might be worth a look at.
You'll find some additional background info in Scrubbing your Django database.
This snippet might be helpful for you (it follows relationships and serializes them):
http://djangosnippets.org/snippets/918/
You could use also that management command and override the default managers for whichever models you would like to return custom querysets.
This isn't a simple answer to my question, but I found some interesting docs on Django's built-in natural keys feature, which would allow representing serialized records without the primary key. Unfortunately, it doesn't look like this is fully integrated into dumpdata, and there's an old outstanding ticket to fully rely on natural keys.
It also seems the serializers.serialize() function allows serialization of an arbitrary list of specific model instances.
Presumably, if I implemented a natural_key() method on all my models, and then called serializers.serialize([Users.objects.filter(criteria)]), it should come close to accomplishing what I want. I might have to write a function to crawl all the FK references, and include those in the list of objects passed to serialize().
This is a very old question, but I recently wrote a custom management command to do just that. It looks very similar to the existing dumpdata command except that it takes some extra arguments to define how I want to filter the querysets and it overrides the get_objects function to perform the actual filtering:
def get_objects(dump_attributes, dump_values):
qs_1 = ModelClass1.objects.filter(**options["filter_options_for_model_class_1"])
qs_2 = ModelClass2.objects.filter(**options["filter_options_for_model_class_2"])
# ...repeat for as many different model classes you want to dump...
yield from chain(qs_1, qs_2, ...)
I had the same problem but i didn't want to add another package and the snippet still didn't let me to filter my data and i just want a temporary solution
So i thought with my self why not override the default manager apply my filter there, take the dump and then revert my code back. This is of course too hacky and dangerous but in my case made sense.
Yes I had to vim code on live server but you don't need to reload the server since running command through manage.py would run your current code base so the server from the end-user perspective basically remained on-touched.
from django.db.models import Manager
class DahlBookManager(Manager):
def get_queryset(self):
return super().get_queryset().filter(is_edited=False)
class FriendshipQuestion(models.Model):
objects = DahlBookManager()
and then running the dumpdata command did exactly what i needed which was returning all the unedited questions in my case.
Then I git checkout mymodelfile.py to revert it back to the original.
This by no mean is a good solution but it will get somebody either fired or unstuck.
As of Django 3.2, you can use dumpdata to dump a specific app and/or model. For example, for an app named customer:
python manage.py dumpdata customer
or, to dump a model named shoppingcart within the customer app:
python manage.py dumpdata customer.shoppingcart
There are many options with dumpdata, including writing to several output file formats and handling custom managers on models. For example:
python manage.py dumpdata customer --all --indent 4 --output my_fixtures.json
The options:
--all: dumps the records even if you use a custom manager on the model
--indent : amount to indent when writing to file
--output : Send output to a file instead of stdout. Default format is JSON.
See the docs at:
https://docs.djangoproject.com/en/3.2/ref/django-admin/#dumpdata

Django Accessing external Database to get data into project database

i'm looking for a "best-practice" guide/solution to the following situation.
I have a Django project with a MySql DB which i created and manage. I have to import data, every 5 minutes, from a second (external, not managed by me) db in order to do some actions. I have read rights for the external db and all the necessary information.
I have read the django docs regarding the usage of multiple database: register the db in settings.py, migrate using the --database flag, query/access data by routing to the db (short version) and multiple question on this matter on stackoverflow.
So my plan is:
Register the second database in settings.py, use inspectdb to add to the model, migrate, define a method which reads data from the external db and add it to the internal (own) db.
However I do have some questions:
Do i have to register the external db if i don't manage it?
(Most probably yes in order to use ORM or the cursors to access the data)
How can i migrate the model if I don't manage the DB and don't have write permissions? I also don't need all the tables (around 250, but only 5 needed).
(is fake migration an option worth considering? I would use inspectdb and migrate only the necessary tables.)
Because I only need to retrieve data from the external db and not to write back, would it suffice to have a method that constantly gets the latest data like the second solution suggested in this answer
Any thoughts/ideas/suggestions are welcomed!
I would not use Django's ORM for it, but rather just access the DB with psycopg2 and SQL, get the columns you care about into dicts, and work with those. Otherwise any minor change to that external DB's tables may break your Django app, because the models don't match anymore. That could create more headaches than an ORM is worth.

Archive data after every year

I have lots of models in my project like Advertisements, UserDetails etc. Currently I have to delete the entire database every year so as to not create any conflicts between this year data and previous year data.
I want to implement a feature that can allow me to switch between different years. What can be the best way to implement this?
I think you could switch schemas in PostgreSQL. It's not completely straightforward. There are several ways to do that you can look into. The way I did it was to use a default search path for the Django database user account (e.g. user2018, user2019, etc) that only included the schema I wanted to use. I can't check the exact settings right now because my office network is down. You can also do it in settings.py or in each individual model using db_table according to what I've read, although both those solutions seem more convoluted that using the search path.
You would have to shutdown, change the database username in settings.py (or change the search path in PostgreSQL, change the schema over to a new one, and then run migrate to create the tables again. If you have reference data in any of the tables then schema-to-schema copies are easy to do.
Try searching for change django database schema postgresql to see what options there are for specifying the schema.

How can I add new models and do migrations without restarting the server manually?

For the app I'm building I need to be able to create a new data model in models.py as fast as possible automatically.
I created a way to do this by making a seperate python program that opens models.py, edits it, closes it, and does server migrations automatically but there must be a better way.
edit: my method works on my local server but not on pythonanywhere
In the Django documentation, I found SchemaEditor, which is exactly what you want. Using the SchemaEditor, you can create Models, delete Models, add fields, delete fields etc..
Here's an excerpt:
Django’s migration system is split into two parts; the logic for
calculating and storing what operations should be run
(django.db.migrations), and the database abstraction layer that turns
things like “create a model” or “delete a field” into SQL - which is
the job of the SchemaEditor.
Don't rewrite your models.py file automatically, that is not how it's meant to work. When you need more flexibility in the way you store data, you should do the following:
think hard about what kind of data you want to store and make your data model more abstract to fit more cases, if needed.
Use JSON fields to store arbitrary JSON data with your model (e.g. for the Postgres database)
if it's not a fit, don't use Django's ORM and use a different store (e.g. Redis for key-value or MongoDB for JSON documents)

Django unit-testing with loading fixtures for several dependent applications problems

I'm now making unit-tests for already existing code. I faced the next problem:
After running syncdb for creating test database, Django automatically fills several tables like django_content_type or auth_permissions.
Then, imagine I need to run a complex test, like check the users registration, that will need a lof ot data tables and connections between them.
If I'll try to use my whole existing database for making fixtures (that would be rather convinient for me) - I will receive the error like here. This happens because, Django has already filled tables like django_content_type.
The next possible way is to use django dumpdata --exclude option for already filled with syncdb tables. But this doesn't work well also, because if I take User and User Group objects from my db and User Permissions table, that was automatically created by syncdb, I can receive errors, because the primary keys, connecting them are now pointing wrong. This is better described here in part 'fixture hell', but the solution shown there doensn't look good)
The next possible scheme I see is next:
I'm running my tests; Django creates test database, makes syncdb and creates all those tables.
In my test setup I'm dropping this database, creating the new blank database.
Load data dump from existing database also in test setup
That's how the problem was solved:
After the syncdb has created the test database, in setUp part of the tests I use os.system to access shell from my code. Then I'm just loading the dump of the database, which I want to use for tests.
So this works like this: syncdb fills contenttype and some other tables with data. Then in setUp part of tests loading the sql dump clears all the previously created data and i get a nice database.
May be not the best solution, but it works=)
My approach would be to first use South to make DB migrations easy (which doesn't help at all, but is nice), and then use a module of model creation methods.
When you run
$ manage.py test my_proj
Django with South installed with create the Test DB, and run all your migrations to give you a completely updated test db.
To write tests, first create a python module calle, test_model_factory.py In here create functions that create your objects.
def mk_user():
User.objects.create(...)
Then in your tests you can import your test_model_factory module, and create objects for each test.
def test_something(self):
test_user = test_model_factory.mk_user()
self.assert(test_user ...)