How can I make a fixture out of QuerySet in django? - django

Django dumpdata command is broken because it does not support any reasonable way to narrow down the amount of data dumped. I need to create a fixture of various querysets (and I don't need to take care about dumping objects from outer models relations). Limiting the number of items for those querysets, like django-test-utils makefixture does is not sufficient. Tried to achieve this by using a proxy model with custom manager, but this approach does not work - dumpdata ommits proxy models (which is reasonable).

If dumpdata doesn't work, you can do the same through Django Serializing data.
from django.core import serializers
data = serializers.serialize("json", SomeModel.objects.all())
and then write the data on a file.

The following steps will help in making the solution complete providing support to create a fixture of various querysets.
from django.core import serializers
from django.core.management.commands.dumpdata import sort_dependencies
app_list = {}
# Add all your querysets here. The key for the dictionary can be just a
# unique dummy string (A safe hack after reading django code)
app_list['app1_name'] = FirstModel.objects.all()
app_list['app2_name'] = SecondModel.objects.all()
# The sort_dependencies will ensure that the models are sorted so that
# those with foreign keys are taken care. If SecondModel has a fk to FirstModel,
# then sort_dependencies will take care of the ordering in the json file so that
# FirstModel comes first in the fixture thus preventing ambiguity when reloading
data = serializers.serialize("json", sort_dependencies(app_list.items()))
f = open('output.json', 'w')
f.write(data)
f.close()
Now the output will be available in output.json file. To rebuild the models from the json file:
from django.core import serializers
for obj in serializers.deserialize('json', open('output.json').read()):
obj.save()
EDIT: Strangely, the sort_dependencies didn't work as expected. So I ended up using python ordereddict and decided the order myself.
import collections
app_list = collections.OrderedDict()

In case you want to save json data directly to a file, you can use:
from django.core import serializers
data = YourModel.objects.all()
with open("fixtures.json", "w") as out:
serializers.serialize("json", data, stream=out)

I'm not sure what you mean by "outer models relations", maybe an example would help, but you can pass dumpdata the model you're interested in...
manage.py dumpdata --help
Usage: ./manage.py dumpdata [options] [appname appname.ModelName ...]
and there's the exclude switch:
-e EXCLUDE, --exclude=EXCLUDE
An appname or appname.ModelName to exclude (use
multiple --exclude to exclude multiple apps/models).

Related

How to Optimize Data Importing to a Django Model using django-import-export library

I have been using django-import-export library to upload my data as excel to the Django model and it has been working fine till i had to Upload an excel with 20,000 rows and it just took infinite time to get this action done.
Can you please suggest the right way to optimize data Uploading to the Django model, where i can easily upload excel files and have that data saved in my Database.
Please support.
Hi below is the code for admin.py that i tried, but its throwing error 'using_transactions' is not defined, please confirm where am i going wrong and what do i change to get bulk data imported in less time--
from django.contrib import admin
from import_export import resources
from .models import Station,Customer
from import_export.admin import ImportExportModelAdmin
# Register your models here.
class StationResource(resources.ModelResource):
def get_or_init_instance(self,instance_loader,row):
self.bulk_create(self, using_transactions, dry_run, raise_errors,
batch_size=1000)
class Meta:
model=Station
use_bulk=True
batch_size = 1000
force_init_instance = True
class StationAdmin(ImportExportModelAdmin):
resource_class=StationResource
admin.site.register(Station , StationAdmin)
and in the settings.py file i have set-
IMPORT_EXPORT_USE_TRANSACTIONS = True
IMPORT_EXPORT_SKIP_ADMIN_LOG = True
import-export provides a bulk import mode which makes use of Django's bulk operations.
Simply enable the use_bulk flag on your resource:
class Meta:
model = Book
fields = ('id', 'name', 'author_email', 'price')
use_bulk = True
It should be possible to import 20k rows in a few seconds, but it will depend on your data and you may need to tweak some settings. Also, do read the caveats regarding bulk imports.
There is more detailed information in the repo.
However, even without bulk mode it should be able to import 20k rows in a few minutes. If it is taking much longer, then it is possible that the import process is making unnecessary reads on the db (i.e for each row). Enabling SQL logging will shed some light on this. CachedInstanceLoader may help with this.
I wouldn't use import-export for large data,
instead, I'd save the data file as csv from my excel file and use pandas to bridge the data into the database. Pandas does it in batches.

Copy a database column into another in Django

I am writing a migration that requires me to fill a field with existing data from another field (with same type and constraints). Is it possible in Django to copy the data in a single operation? The destination column already exists when I need to copy the data.
In SQL, I would have written something like that:
UPDATE my_table SET column_b = column_a;
Edit
The current answer proposes to loop over the model instances, but that is what I want to avoid. Can it be done without a loop?
As the comment mentioned, you can simply write a migration for this. I think the below should work though I haven't tested it. It uses the queryset update API and F to avoid looping
from __future__ import unicode_literals
from django.apps import apps
from django.db import migrations, models
from django.db.models import F
def copy_field(apps, schema):
MyModel = apps.get_model('<your app>', 'MyModel')
MyModel.objects.all().update(column_a=F('column_b'))
class Migration(migrations.Migration):
dependencies = [
('<your app>', '<previous migration>'),
]
operations = [
migrations.RunPython(code=copy_field),
]

Import CSV to Django Model

I have CSV file backed up from my django model with django-import-export,I want to restore that to my model how can i do that ?
When i want to create object for each of them i have problem with foreign keys.
id,name,address,job,KBSE
1,Hamid,3,Doctor,4311
2,Ali,7,Artist,5343
3,Reza,2,Singer,5232
See Import data workflow. Most functions can be overridden in resource subclass. If that does not help, please open issue.
You can use customized python script using pandas to load csv data into Django model.
#First define your Django Enviromental variables
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "DjangoProjectName.settings")
import django
django.setup()
import pandas as pd
import numpy as np
#Import Required Django model
from djangoApp.models import * #models
# Import CSV file
df = pd.read_csv('csv_url.csv')
# Do required pre-processing on panda dataframe
# such as data cleaning, data format settings etc..
# Iterater throught panda dataframe and save data in Django model
for index, row in df.iterrows():
# create django model
samplemodelObject = SampleModel()
# Normal Fields ( Non-foreign key fields) adding
samplemodelObject.field_name01 = row['Field_01']
# Adding Foreign Key Field
samplemodelObject.field_foreignkey = ForeignKeyModel.objects.get( fk_key = row['fk_value_field']
# save data model instance
samplemodelObject.save()
samplemodelObject.clear()
You should import the table in the right order, make sure you import all the data the you depend first.
So load foreign tables and than load the current. If you don't have the foreign data and some of them where erased, you'll have to create it back.
Good luck!

How to load fixtures in Django south migrations properly?

I am using Django 1.5b1 and south migrations and life has generally been great. I have some schema updates which create my database, with a User table among others. I then load a fixture for ff.User (my custom user model):
def forwards(self, orm):
from django.core.management import call_command
fixture_path = "/absolute/path/to/my/fixture/load_initial_users.json"
call_command("loaddata", fixture_path)
All has been working great until I have added another field to my ff.User model, much further down the migration line. My fixture load now breaks:
DatabaseError: Problem installing fixture 'C:\<redacted>create_users.json':
Could not load ff.User(pk=1): (1054, "Unknown column 'timezone_id' in 'field list'")
Timezone is the field (ForeignKey) which I added to my user model.
The ff.User differs from what is in the database, so the Django ORM gives up with a DB error. Unfortunately, I cannot specify my model in my fixture as orm['ff.User'], which seems to be the south way of doing things.
How should I load fixtures properly using south so that they do not break once the models for which these fixtures are for gets modified?
I found a Django snippet that does the job!
https://djangosnippets.org/snippets/2897/
It load the data according to the models frozen in the fixture rather than the actual model definition in your apps code! Works perfect for me.
I proposed a solution that might interest you too:
https://stackoverflow.com/a/21631815/797941
Basicly, this is how I load my fixture:
from south.v2 import DataMigration
import json
class Migration(DataMigration):
def forwards(self, orm):
json_data=open("path/to/your/fixture.json")
items = json.load(json_data)
for item in items:
# Be carefull, this lazy line won't resolve foreign keys
obj = orm[item["model"]](**item["fields"])
obj.save()
json_data.close()
This was a frustrating part of using fixtures for me as well. My solution was to make a few helper tools. One which creates fixtures by sampling data from a database and includes South migration history in the fixtures.
There's also a tool to add South migration history to existing fixtures.
The third tool checks out the commit when this fixture was modified, loads the fixture, then checks out the most recent commit and does a south migration and dumps the migrated db back to the fixture. This is done in a separate database so your default db doesn't get stomped on.
The first two can be considered beta code, and the third please treat as usable alpha, but they're already being quite helpful to me.
Would love to get some feedback from others:
git#github.com:JivanAmara/django_fixture_tools.git
Currently, it only supports projects using git as the RCS.
The most elegant solution I've found is here where by your app model's get_model function is switched out to instead supply the model from the supplied orm. It's then set back after the fixture is applied.
from django.db import models
from django.core.management import call_command
def load_fixture(file_name, orm):
original_get_model = models.get_model
def get_model_southern_style(*args):
try:
return orm['.'.join(args)]
except:
return original_get_model(*args)
models.get_model = get_model_southern_style
call_command('loaddata', file_name)
models.get_model = original_get_model
You call it with load_fixture('my_fixture.json', orm) from within you forwards definition.
Generally South handles migrations using forwards() and backwards() functions. In your case you should either:
alter the fixtures to contain proper data, or
import fixture before migration that breaks it (or within the same migration, but before altering the schema),
In the second case, before migration adding (or, as in your case, removing) the column, you should perform the migration that will explicitly load the fixtures similarly to this (docs):
def forwards(self, orm):
from django.core.management import call_command
call_command("loaddata", "create_users.json")
I believe this is the easiest way to accomplish what you needed. Also make sure you do not do some simple mistakes like trying to import data with new structure before applying older migrations.
Reading the following two posts has helped me come up with a solution:
http://andrewingram.net/2012/dec/common-pitfalls-django-south/#be-careful-with-fixtures
http://news.ycombinator.com/item?id=4872596
Specifically, I rewrote my data migrations to use output from 'dumpscript'
I needed to modify the resulting script a bit to work with south. Instead of doing
from ff.models import User
I do
User = orm['ff.User']
This works exactly like I wanted it to. Additionally, it has the benefit of not hard-coding IDs, like fixtures require.

Automatically import all db tables in manage.py shell

Is there a snippet or easy way to import all of my django tables when entering the prompt?
For example, usually my commands go something like this:
>>> from userprofile.models import Table
>>> Table.objects...
This way, as soon as I entered the prompt, I'd already have the tables imported. Thank you.
django-extensions adds the shell_plus command for manage.py which does exactly this.
from django.db.models import get_models
for _class in get_models():
globals()[_class.__name__] = _class
Here you end up with all installed models available globally, refering to them with their class name. Read the docs for django.db.models.get_models for more info:
Definition: get_models(self, app_mod=None, include_auto_created=False, include_deferred=False)
Docstring:
Given a module containing models, returns a list of the models.
Otherwise returns a list of all installed models.
By default, auto-created models (i.e., m2m models without an
explicit intermediate table) are not included. However, if you
specify include_auto_created=True, they will be.
By default, models created to satisfy deferred attribute
queries are not included in the list of models. However, if
you specify include_deferred, they will be.
You can do this:
>>> from django.conf import settings
>>> for app in settings.INSTALLED_APPS:
... module = __import__(app)
... globals()[module.__name__] = module
...
You'll get the fully qualified names though; userprofile.models.Table instead of Table.