Making backup of db skiping part of data in django [duplicate] - django

Is it possible to selectively filter which records Django's dumpdata management command outputs? I have a few models, each with millions of rows, and I only want to dump records in one model fitting a specific criteria, as well as all foreign-key linked records referencing any of those records.
Consider this use-case. Say I had a production database where my User model has millions of records. I have several other models (Log, Transaction, Purchase, Bookmarks, etc) all referencing the User model. I want to do development on my Django app, and I want to test using realistic data. However, my production database is so enormous, I can't realistically take a snapshot of the entire thing and load it locally. So ideally, I'd want to use dumpdata to dump 50 random User records, and all related records to JSON, and use that to populate a development database.
Is there an easy way to accomplish this?

I think django-fixture-magic might be worth a look at.
You'll find some additional background info in Scrubbing your Django database.

This snippet might be helpful for you (it follows relationships and serializes them):
http://djangosnippets.org/snippets/918/
You could use also that management command and override the default managers for whichever models you would like to return custom querysets.

This isn't a simple answer to my question, but I found some interesting docs on Django's built-in natural keys feature, which would allow representing serialized records without the primary key. Unfortunately, it doesn't look like this is fully integrated into dumpdata, and there's an old outstanding ticket to fully rely on natural keys.
It also seems the serializers.serialize() function allows serialization of an arbitrary list of specific model instances.
Presumably, if I implemented a natural_key() method on all my models, and then called serializers.serialize([Users.objects.filter(criteria)]), it should come close to accomplishing what I want. I might have to write a function to crawl all the FK references, and include those in the list of objects passed to serialize().

This is a very old question, but I recently wrote a custom management command to do just that. It looks very similar to the existing dumpdata command except that it takes some extra arguments to define how I want to filter the querysets and it overrides the get_objects function to perform the actual filtering:
def get_objects(dump_attributes, dump_values):
qs_1 = ModelClass1.objects.filter(**options["filter_options_for_model_class_1"])
qs_2 = ModelClass2.objects.filter(**options["filter_options_for_model_class_2"])
# ...repeat for as many different model classes you want to dump...
yield from chain(qs_1, qs_2, ...)

I had the same problem but i didn't want to add another package and the snippet still didn't let me to filter my data and i just want a temporary solution
So i thought with my self why not override the default manager apply my filter there, take the dump and then revert my code back. This is of course too hacky and dangerous but in my case made sense.
Yes I had to vim code on live server but you don't need to reload the server since running command through manage.py would run your current code base so the server from the end-user perspective basically remained on-touched.
from django.db.models import Manager
class DahlBookManager(Manager):
def get_queryset(self):
return super().get_queryset().filter(is_edited=False)
class FriendshipQuestion(models.Model):
objects = DahlBookManager()
and then running the dumpdata command did exactly what i needed which was returning all the unedited questions in my case.
Then I git checkout mymodelfile.py to revert it back to the original.
This by no mean is a good solution but it will get somebody either fired or unstuck.

As of Django 3.2, you can use dumpdata to dump a specific app and/or model. For example, for an app named customer:
python manage.py dumpdata customer
or, to dump a model named shoppingcart within the customer app:
python manage.py dumpdata customer.shoppingcart
There are many options with dumpdata, including writing to several output file formats and handling custom managers on models. For example:
python manage.py dumpdata customer --all --indent 4 --output my_fixtures.json
The options:
--all: dumps the records even if you use a custom manager on the model
--indent : amount to indent when writing to file
--output : Send output to a file instead of stdout. Default format is JSON.
See the docs at:
https://docs.djangoproject.com/en/3.2/ref/django-admin/#dumpdata

Related

backing up django database with dumpdata

For testing, I would like to be able to save and restore app state. This would seem to be a very common requirement!
I find that I have to do
python manage.py dumpdata --exclude=contenttypes --exclude=auth > sitedata.json
in order for loaddata (after flush) not to complain about uniqueness violations and such.
At present this is just a magic incantation for me that I found in online searches. I don't find the explanations comprehensible.
I would like to know: first, why I have to exclude auth; second, what contenttypes even is, as well as why I have to exclude it. My concern is not that I can't do what I need to do now, but that I don't understand it and wonder if there are other corners of this procedure waiting to bite me.
Thanks for any information or links.
I had not so positive experience using dumpdata for database backup, and it wasn't designed for that apparently. I ended up writing my own management command that calls PostgreSQL pg_dump command. I would recommend using your database dump function directly.
You exclude auth - because there might be some sensitive data, like user's email or password - so there is no real need to backup that data for testing purposes - tester can create its own user (more to that - he cant even now what is the password for your backuped user - password is hashed).
"contenttypes" - this is a table, created by included into Django Content Type Framework.
Instances of ContentType represent and store information about the
models installed in your project, and new instances of ContentType are
automatically created whenever new models are installed.
It is created automatically and stores, so to say "types" for all models, translated into simple numbers. This is for advanced usage. I don't know exactly why it is recommended to omit while making dump data. May be that means that this table is used not in all projects or something else - so there is no real need to backup it, may be to reduce the backup file size. Or may be when tester make migrations on his computer - contenttypes table will already be created, so there is no need to pass that data.

How can I add new models and do migrations without restarting the server manually?

For the app I'm building I need to be able to create a new data model in models.py as fast as possible automatically.
I created a way to do this by making a seperate python program that opens models.py, edits it, closes it, and does server migrations automatically but there must be a better way.
edit: my method works on my local server but not on pythonanywhere
In the Django documentation, I found SchemaEditor, which is exactly what you want. Using the SchemaEditor, you can create Models, delete Models, add fields, delete fields etc..
Here's an excerpt:
Django’s migration system is split into two parts; the logic for
calculating and storing what operations should be run
(django.db.migrations), and the database abstraction layer that turns
things like “create a model” or “delete a field” into SQL - which is
the job of the SchemaEditor.
Don't rewrite your models.py file automatically, that is not how it's meant to work. When you need more flexibility in the way you store data, you should do the following:
think hard about what kind of data you want to store and make your data model more abstract to fit more cases, if needed.
Use JSON fields to store arbitrary JSON data with your model (e.g. for the Postgres database)
if it's not a fit, don't use Django's ORM and use a different store (e.g. Redis for key-value or MongoDB for JSON documents)

django dumpdata ORM + mongoengine

I'm not sure how to handle the following case (thus my question, obviously).
I have a django setup with postgresql to contains all the django model data, but I also have mongoengine managing (let's call them) extended data.
I also have a circular reference between the two (mongo_id points from django model to mongoengine document PK, and db_id points from mongoengine to django model PK).
Obviously, if I run dumpdata, I only get django model data. How can I make it to also dump data from mongoengine? Is there a way for me to achieve this?
This is to get a backup of the data. Backup of referenced files can be easily done by just grabbing the file on disk.
I did not define another DATABASES in the settings.py file (mainly because I was not required to). Is that what I need to do?
Thanks for any pointers.
As a bonus, I would appreciate if I could those mongoengine in the admin interface, but also the base django models.
First of all you can dump your data using mongodump
In one project we had to move data from one database to another with a significantly different schema so we created a management command that would do that. If you would want to use it in a similar manner it would have the advantage of moving only valid data for your current Document definitions and leaving out any possible leftovers from the older ones.
The dumping management command should contain something like
from bson import json_utils
json_util.dumps(map(lambda x: x.to_mongo(), SomeDocument.objects.all()))

Django unit-testing with loading fixtures for several dependent applications problems

I'm now making unit-tests for already existing code. I faced the next problem:
After running syncdb for creating test database, Django automatically fills several tables like django_content_type or auth_permissions.
Then, imagine I need to run a complex test, like check the users registration, that will need a lof ot data tables and connections between them.
If I'll try to use my whole existing database for making fixtures (that would be rather convinient for me) - I will receive the error like here. This happens because, Django has already filled tables like django_content_type.
The next possible way is to use django dumpdata --exclude option for already filled with syncdb tables. But this doesn't work well also, because if I take User and User Group objects from my db and User Permissions table, that was automatically created by syncdb, I can receive errors, because the primary keys, connecting them are now pointing wrong. This is better described here in part 'fixture hell', but the solution shown there doensn't look good)
The next possible scheme I see is next:
I'm running my tests; Django creates test database, makes syncdb and creates all those tables.
In my test setup I'm dropping this database, creating the new blank database.
Load data dump from existing database also in test setup
That's how the problem was solved:
After the syncdb has created the test database, in setUp part of the tests I use os.system to access shell from my code. Then I'm just loading the dump of the database, which I want to use for tests.
So this works like this: syncdb fills contenttype and some other tables with data. Then in setUp part of tests loading the sql dump clears all the previously created data and i get a nice database.
May be not the best solution, but it works=)
My approach would be to first use South to make DB migrations easy (which doesn't help at all, but is nice), and then use a module of model creation methods.
When you run
$ manage.py test my_proj
Django with South installed with create the Test DB, and run all your migrations to give you a completely updated test db.
To write tests, first create a python module calle, test_model_factory.py In here create functions that create your objects.
def mk_user():
User.objects.create(...)
Then in your tests you can import your test_model_factory module, and create objects for each test.
def test_something(self):
test_user = test_model_factory.mk_user()
self.assert(test_user ...)

Can I use a database view as a model in Django?

i'd like to use a view i've created in my database as the source for my django-view.
Is this possible, without using custom sql?
******13/02/09 UPDATE***********
Like many of the answers suggest, you can just make your own view in the database and then use it within the API by defining it in models.py.
some warning though:
manage.py syncdb will not work anymore
the view need the same thing at the start of its name as all the other models(tables) e.g if your app is called "thing" then your view will need to be called thing_$viewname
Just an update for those who'll encounter this question (from Google or whatever else)...
Currently Django has a simple "proper way" to define model without managing database tables:
Options.managed
Defaults to True, meaning Django will create the appropriate database tables in syncdb and remove them as part of a reset management command. That is, Django manages the database tables' lifecycles.
If False, no database table creation or deletion operations will be performed for this model. This is useful if the model represents an existing table or a database view that has been created by some other means. This is the only difference when managed is False. All other aspects of model handling are exactly the same as normal.
Since Django 1.1, you can use Options.managed for that.
For older versions, you can easily define a Model class for a view and use it like your other views. I just tested it using a Sqlite-based app and it seems to work fine. Just make sure to add a primary key field if your view's "primary key" column is not named 'id' and specify the view's name in the Meta options if your view is not called 'app_classname'.
The only problem is that the "syncdb" command will raise an exception since Django will try to create the table. You can prevent that by defining the 'view models' in a separate Python file, different than models.py. This way, Django will not see them when introspecting models.py to determine the models to create for the app and therefor will not attempt to create the table.
I just implemented a model using a view with postgres 9.4 and django 1.8.
I created custom migration classes like this:
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('myapp', '0002_previousdependency'),
]
sql = """
create VIEW myapp_myview as
select your view here
"""
operations = [
migrations.RunSQL("drop view if exists myapp_myview;"),
migrations.RunSQL(sql)
]
I wrote the model as I normally would. It works for my purposes.
Note- When I ran makemigrations a new migration file was created for the model, which I manually deleted.
Full disclosure- my view is read only because I am using a view derived from a jsonb data type and have not written an ON UPDATE INSTEAD rule.
We've done this quite extensively in our applications with MySQL to work around the single database limitation of Django. Our application has a couple of databases living in a single MySQL instance. We can achieve cross-database model joins this way as long as we have created views for each table in the "current" database.
As far as inserts/updates into views go, with our use cases, a view is basically a "select * from [db.table];". In other words, we don't do any complex joins or filtering so insert/updates trigger from save() work just fine. If your use case requires such complex joins or extensive filtering, I suspect you won't have any problems for read-only scenarios, but may run into insert/update issues. I think there are some underlying constraints in MySQL that prevent you from updating into views that cross tables, have complex filters, etc.
Anyway, your mileage may vary if you are using a RDBMS other than MySQL, but Django doesn't really care if its sitting on top of a physical table or view. It's going to be the RDBMS that determines whether it actually functions as you expect. As a previous commenter noted, you'll likely be throwing syncdb out the window, although we successfully worked around it with a post-syncdb signal that drops the physical table created by Django and runs our "create view..." command. However, the post-syncdb signal is a bit esoteric in the way it gets triggered, so caveat emptor there as well.
EDIT: Of course by "post-syncdb signal" I mean "post-syncdb listener"
From Django Official Documentation, you could call the view like this:
#import library
from django.db import connection
#Create the cursor
cursor = connection.cursor()
#Write the SQL code
sql_string = 'SELECT * FROM myview'
#Execute the SQL
cursor.execute(sql_string)
result = cursor.fetchall()
Hope it helps ;-)