django dumpdata ORM + mongoengine - django

I'm not sure how to handle the following case (thus my question, obviously).
I have a django setup with postgresql to contains all the django model data, but I also have mongoengine managing (let's call them) extended data.
I also have a circular reference between the two (mongo_id points from django model to mongoengine document PK, and db_id points from mongoengine to django model PK).
Obviously, if I run dumpdata, I only get django model data. How can I make it to also dump data from mongoengine? Is there a way for me to achieve this?
This is to get a backup of the data. Backup of referenced files can be easily done by just grabbing the file on disk.
I did not define another DATABASES in the settings.py file (mainly because I was not required to). Is that what I need to do?
Thanks for any pointers.
As a bonus, I would appreciate if I could those mongoengine in the admin interface, but also the base django models.

First of all you can dump your data using mongodump
In one project we had to move data from one database to another with a significantly different schema so we created a management command that would do that. If you would want to use it in a similar manner it would have the advantage of moving only valid data for your current Document definitions and leaving out any possible leftovers from the older ones.
The dumping management command should contain something like
from bson import json_utils
json_util.dumps(map(lambda x: x.to_mongo(), SomeDocument.objects.all()))

Related

Making backup of db skiping part of data in django [duplicate]

Is it possible to selectively filter which records Django's dumpdata management command outputs? I have a few models, each with millions of rows, and I only want to dump records in one model fitting a specific criteria, as well as all foreign-key linked records referencing any of those records.
Consider this use-case. Say I had a production database where my User model has millions of records. I have several other models (Log, Transaction, Purchase, Bookmarks, etc) all referencing the User model. I want to do development on my Django app, and I want to test using realistic data. However, my production database is so enormous, I can't realistically take a snapshot of the entire thing and load it locally. So ideally, I'd want to use dumpdata to dump 50 random User records, and all related records to JSON, and use that to populate a development database.
Is there an easy way to accomplish this?
I think django-fixture-magic might be worth a look at.
You'll find some additional background info in Scrubbing your Django database.
This snippet might be helpful for you (it follows relationships and serializes them):
http://djangosnippets.org/snippets/918/
You could use also that management command and override the default managers for whichever models you would like to return custom querysets.
This isn't a simple answer to my question, but I found some interesting docs on Django's built-in natural keys feature, which would allow representing serialized records without the primary key. Unfortunately, it doesn't look like this is fully integrated into dumpdata, and there's an old outstanding ticket to fully rely on natural keys.
It also seems the serializers.serialize() function allows serialization of an arbitrary list of specific model instances.
Presumably, if I implemented a natural_key() method on all my models, and then called serializers.serialize([Users.objects.filter(criteria)]), it should come close to accomplishing what I want. I might have to write a function to crawl all the FK references, and include those in the list of objects passed to serialize().
This is a very old question, but I recently wrote a custom management command to do just that. It looks very similar to the existing dumpdata command except that it takes some extra arguments to define how I want to filter the querysets and it overrides the get_objects function to perform the actual filtering:
def get_objects(dump_attributes, dump_values):
qs_1 = ModelClass1.objects.filter(**options["filter_options_for_model_class_1"])
qs_2 = ModelClass2.objects.filter(**options["filter_options_for_model_class_2"])
# ...repeat for as many different model classes you want to dump...
yield from chain(qs_1, qs_2, ...)
I had the same problem but i didn't want to add another package and the snippet still didn't let me to filter my data and i just want a temporary solution
So i thought with my self why not override the default manager apply my filter there, take the dump and then revert my code back. This is of course too hacky and dangerous but in my case made sense.
Yes I had to vim code on live server but you don't need to reload the server since running command through manage.py would run your current code base so the server from the end-user perspective basically remained on-touched.
from django.db.models import Manager
class DahlBookManager(Manager):
def get_queryset(self):
return super().get_queryset().filter(is_edited=False)
class FriendshipQuestion(models.Model):
objects = DahlBookManager()
and then running the dumpdata command did exactly what i needed which was returning all the unedited questions in my case.
Then I git checkout mymodelfile.py to revert it back to the original.
This by no mean is a good solution but it will get somebody either fired or unstuck.
As of Django 3.2, you can use dumpdata to dump a specific app and/or model. For example, for an app named customer:
python manage.py dumpdata customer
or, to dump a model named shoppingcart within the customer app:
python manage.py dumpdata customer.shoppingcart
There are many options with dumpdata, including writing to several output file formats and handling custom managers on models. For example:
python manage.py dumpdata customer --all --indent 4 --output my_fixtures.json
The options:
--all: dumps the records even if you use a custom manager on the model
--indent : amount to indent when writing to file
--output : Send output to a file instead of stdout. Default format is JSON.
See the docs at:
https://docs.djangoproject.com/en/3.2/ref/django-admin/#dumpdata

Django Accessing external Database to get data into project database

i'm looking for a "best-practice" guide/solution to the following situation.
I have a Django project with a MySql DB which i created and manage. I have to import data, every 5 minutes, from a second (external, not managed by me) db in order to do some actions. I have read rights for the external db and all the necessary information.
I have read the django docs regarding the usage of multiple database: register the db in settings.py, migrate using the --database flag, query/access data by routing to the db (short version) and multiple question on this matter on stackoverflow.
So my plan is:
Register the second database in settings.py, use inspectdb to add to the model, migrate, define a method which reads data from the external db and add it to the internal (own) db.
However I do have some questions:
Do i have to register the external db if i don't manage it?
(Most probably yes in order to use ORM or the cursors to access the data)
How can i migrate the model if I don't manage the DB and don't have write permissions? I also don't need all the tables (around 250, but only 5 needed).
(is fake migration an option worth considering? I would use inspectdb and migrate only the necessary tables.)
Because I only need to retrieve data from the external db and not to write back, would it suffice to have a method that constantly gets the latest data like the second solution suggested in this answer
Any thoughts/ideas/suggestions are welcomed!
I would not use Django's ORM for it, but rather just access the DB with psycopg2 and SQL, get the columns you care about into dicts, and work with those. Otherwise any minor change to that external DB's tables may break your Django app, because the models don't match anymore. That could create more headaches than an ORM is worth.

How can I add new models and do migrations without restarting the server manually?

For the app I'm building I need to be able to create a new data model in models.py as fast as possible automatically.
I created a way to do this by making a seperate python program that opens models.py, edits it, closes it, and does server migrations automatically but there must be a better way.
edit: my method works on my local server but not on pythonanywhere
In the Django documentation, I found SchemaEditor, which is exactly what you want. Using the SchemaEditor, you can create Models, delete Models, add fields, delete fields etc..
Here's an excerpt:
Django’s migration system is split into two parts; the logic for
calculating and storing what operations should be run
(django.db.migrations), and the database abstraction layer that turns
things like “create a model” or “delete a field” into SQL - which is
the job of the SchemaEditor.
Don't rewrite your models.py file automatically, that is not how it's meant to work. When you need more flexibility in the way you store data, you should do the following:
think hard about what kind of data you want to store and make your data model more abstract to fit more cases, if needed.
Use JSON fields to store arbitrary JSON data with your model (e.g. for the Postgres database)
if it's not a fit, don't use Django's ORM and use a different store (e.g. Redis for key-value or MongoDB for JSON documents)

How to get data from database without models in django?

I have been crawling around its doc but mostly it uses database with model.
The problem is my database is too large and I don't want to create any models
since it's legacy one, and
I will have to call different tables dynamically,
so I just want to pull data from it. Is that possible in django?
You can go around the model layer and use sql directly. However, you will have to process the tables in python, not having the advantage of using ORM objects.
https://docs.djangoproject.com/en/1.10/topics/db/sql/#executing-custom-sql-directly
As pointed out in a comment, Django provides a way to automatically generate the models from the legacy database with inspectdb.
This guide describes the few manual steps required to "clean" the automatically generated models.
While this doesn't directly answer the stated question of avoiding models, it does address your issue of not wanting to create them yourself, due to the large database.
Data should be stored somewhere. There are a lot of ways to store data, but the most reliable one is a database (hence the name).
You could be storing data in a JSON file and save that. You could also be storing data in environment variables. You can even store data in a plain text file. All of those are NOT recommended. I would just try to use a database, any type of database (MongoDB / Postgres / MySQL, anything). That's what it is meant for.

How to use dumpdata/loaddata in Django with self-referential tables?

I have a Django 1.7 app, developed against sqlite3, and moving to Postgresql. One model table called Place has a parent relation to itself. I used dumpdata to create a .json extract.
When I attempt to use loaddata to read the extract into the Postgres db, Django (not postgres) gives me:
django.core.serializers.base.DeserializationError: Problem installing fixture '/home/ugliest/project/thub/dbdump25jan_nat.json': Place matching query does not exist;
This tells me that django is validating the data before loading it into the database. Is there a way to allow this data in before validation? I know that Postgres has the ability to defer constraint checking, but even if I could trigger that from django, it wouldn't prevent django from validating. How to get the data in?
I feel like I'm missing something obvious. This seems like it would happen often, and have a known simple fix. Do I really have to write some custom mess to reorder the extract file?
Edit: FWIW, never found an answer, and wrote a custom mess to reorder things.