I have CSV file backed up from my django model with django-import-export,I want to restore that to my model how can i do that ?
When i want to create object for each of them i have problem with foreign keys.
id,name,address,job,KBSE
1,Hamid,3,Doctor,4311
2,Ali,7,Artist,5343
3,Reza,2,Singer,5232
See Import data workflow. Most functions can be overridden in resource subclass. If that does not help, please open issue.
You can use customized python script using pandas to load csv data into Django model.
#First define your Django Enviromental variables
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "DjangoProjectName.settings")
import django
django.setup()
import pandas as pd
import numpy as np
#Import Required Django model
from djangoApp.models import * #models
# Import CSV file
df = pd.read_csv('csv_url.csv')
# Do required pre-processing on panda dataframe
# such as data cleaning, data format settings etc..
# Iterater throught panda dataframe and save data in Django model
for index, row in df.iterrows():
# create django model
samplemodelObject = SampleModel()
# Normal Fields ( Non-foreign key fields) adding
samplemodelObject.field_name01 = row['Field_01']
# Adding Foreign Key Field
samplemodelObject.field_foreignkey = ForeignKeyModel.objects.get( fk_key = row['fk_value_field']
# save data model instance
samplemodelObject.save()
samplemodelObject.clear()
You should import the table in the right order, make sure you import all the data the you depend first.
So load foreign tables and than load the current. If you don't have the foreign data and some of them where erased, you'll have to create it back.
Good luck!
Related
I have been using django-import-export library to upload my data as excel to the Django model and it has been working fine till i had to Upload an excel with 20,000 rows and it just took infinite time to get this action done.
Can you please suggest the right way to optimize data Uploading to the Django model, where i can easily upload excel files and have that data saved in my Database.
Please support.
Hi below is the code for admin.py that i tried, but its throwing error 'using_transactions' is not defined, please confirm where am i going wrong and what do i change to get bulk data imported in less time--
from django.contrib import admin
from import_export import resources
from .models import Station,Customer
from import_export.admin import ImportExportModelAdmin
# Register your models here.
class StationResource(resources.ModelResource):
def get_or_init_instance(self,instance_loader,row):
self.bulk_create(self, using_transactions, dry_run, raise_errors,
batch_size=1000)
class Meta:
model=Station
use_bulk=True
batch_size = 1000
force_init_instance = True
class StationAdmin(ImportExportModelAdmin):
resource_class=StationResource
admin.site.register(Station , StationAdmin)
and in the settings.py file i have set-
IMPORT_EXPORT_USE_TRANSACTIONS = True
IMPORT_EXPORT_SKIP_ADMIN_LOG = True
import-export provides a bulk import mode which makes use of Django's bulk operations.
Simply enable the use_bulk flag on your resource:
class Meta:
model = Book
fields = ('id', 'name', 'author_email', 'price')
use_bulk = True
It should be possible to import 20k rows in a few seconds, but it will depend on your data and you may need to tweak some settings. Also, do read the caveats regarding bulk imports.
There is more detailed information in the repo.
However, even without bulk mode it should be able to import 20k rows in a few minutes. If it is taking much longer, then it is possible that the import process is making unnecessary reads on the db (i.e for each row). Enabling SQL logging will shed some light on this. CachedInstanceLoader may help with this.
I wouldn't use import-export for large data,
instead, I'd save the data file as csv from my excel file and use pandas to bridge the data into the database. Pandas does it in batches.
While using Django, when I need to do something on data from DB, I've always been using Django shell. For this time, I want to do something like the below scenario in Django shell.
I have a model Store with a bunch of stores. In Django shell,
1. import Store model
2. With each of store name, I search each name and download a csv file based on each store name.
3. Add the downloaded csv file to a FileField in Store model.
I know how to #1 and #3, but I'm confused how I can do #2. How can I input the store name and download a csv file from the store name in Django shell?
You can use the requests library to fetch data from the internet. Install it like this:
pip install requests
Then, in Django, you can use ContentFile to save the downloaded data into a FileField. Here's an example:
import requests
from django.core.files.base import ContentFile
from .models import SomeModel
response = requests.get('https://raw.githubusercontent.com/wireservice/csvkit/master/examples/test_geo.csv')
s = SomeModal.objects.get(id=0)
s.file_field.save('data.csv', ContentFile(response.content))
s.save()
I am trying to use Django's db connection variable to insert a pandas dataframe to Postgres database. The code I use is
df.to_sql('forecast',connection,if_exists='append',index=False)
And I get the following error
Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': relation "sqlite_master" does not exist
LINE 1: SELECT name FROM sqlite_master WHERE type='table' AND name=?...
I think this happens because the Django connection object is not an sqlalchemy object and therefor Pandas assumes I am using sqlite. Is there any way to use .to_sql other than make another connection to the database?
It is possible to create db configuration in setting.py file
DATABASES = {
'default': env.db('DATABASE_URL_DEFAULT'),
'other': env.db('DATABASE_URL_OTHER')
}
DB_URI_DEFAULT=env.str('DATABASE_URL_DEFAULT')
DB_URI_OTHER=env.str('DATABASE_URL_OTHER')
If you want to create sql_alchemy connection you should use DB_URI_DEFAULT or DB_URI_OTHER
in the init method of the class you will use .to_sql method you should write
from you_project_app import settings
from sqlalchemy import create_engine
import pandas as pd
class Example:
def __init__(self):
self.conn_default = create_engine(settings.DB_URI_DEFAULT).connect()
And when you use .to_sql method of pandas it should be like this:
df_insert.to_sql(table, if_exists='append',index=False,con=self.conn_default)
I am currently on Django 1.7 and my database is PostgreSQL. I have the following model.
# myapp/models.py
from django.db import models
from json_field import JSONField # using django-json-field
from json_field.fields import JSONDecoder
class MyModel(models.Model):
the_data = JSONField(
decoder_kwargs={'cls': JSONDecoder, 'parse_float': float},
default='[{}]',
)
...
I am now looking to upgrade to Django 1.10 and take advantage of the new(ish) JSONField in django.contrib.postgres.fields.
I change my model to look like this.
# myapp/models.py
from django.db import models
from django.contrib.postgres.fields import JSONField # Now want to use the new JSONField
class MyModel(models.Model):
the_data = JSONField(
default='{}',
)
...
I then create a migration for the app.
./manage.py makemigrations myapp
When it attempts to create a migration it complains...
from json_field.forms import JSONFormField
File "/Users/travismillward/Projects/amcards_env/lib/python2.7/site-packages/json_field/forms.py", line 5, in <module>
from django.forms import fields, util
ImportError: cannot import name util
I understand why it is complaining. django-json-field is not updated for django 1.10 and it wants to import json_field in one of the original migration files. So I can either go back and modify my original migration file that imports json_field but then it won't actually modify the column data type because it thinks it is already done. Or, I have to fix django-json-fields to work with django 1.10 just so the migration can be created. And I will have to leave that requirement in place even though I don't use it, it's just for the migration!
On my last project I just modified the original migration to make it think that it was using django.contrib.postgres.fields.jsonb.JSONField all along. However, after I ran the migration, it didn't change the column's data type from text to jsonb. So I manually did that since it was a smaller project. For this project, I really don't want to manually alter the database.
Any suggestions on how to migrate away from django-json-field gracefully and with a plan to remove it from my code and requirements?
Django dumpdata command is broken because it does not support any reasonable way to narrow down the amount of data dumped. I need to create a fixture of various querysets (and I don't need to take care about dumping objects from outer models relations). Limiting the number of items for those querysets, like django-test-utils makefixture does is not sufficient. Tried to achieve this by using a proxy model with custom manager, but this approach does not work - dumpdata ommits proxy models (which is reasonable).
If dumpdata doesn't work, you can do the same through Django Serializing data.
from django.core import serializers
data = serializers.serialize("json", SomeModel.objects.all())
and then write the data on a file.
The following steps will help in making the solution complete providing support to create a fixture of various querysets.
from django.core import serializers
from django.core.management.commands.dumpdata import sort_dependencies
app_list = {}
# Add all your querysets here. The key for the dictionary can be just a
# unique dummy string (A safe hack after reading django code)
app_list['app1_name'] = FirstModel.objects.all()
app_list['app2_name'] = SecondModel.objects.all()
# The sort_dependencies will ensure that the models are sorted so that
# those with foreign keys are taken care. If SecondModel has a fk to FirstModel,
# then sort_dependencies will take care of the ordering in the json file so that
# FirstModel comes first in the fixture thus preventing ambiguity when reloading
data = serializers.serialize("json", sort_dependencies(app_list.items()))
f = open('output.json', 'w')
f.write(data)
f.close()
Now the output will be available in output.json file. To rebuild the models from the json file:
from django.core import serializers
for obj in serializers.deserialize('json', open('output.json').read()):
obj.save()
EDIT: Strangely, the sort_dependencies didn't work as expected. So I ended up using python ordereddict and decided the order myself.
import collections
app_list = collections.OrderedDict()
In case you want to save json data directly to a file, you can use:
from django.core import serializers
data = YourModel.objects.all()
with open("fixtures.json", "w") as out:
serializers.serialize("json", data, stream=out)
I'm not sure what you mean by "outer models relations", maybe an example would help, but you can pass dumpdata the model you're interested in...
manage.py dumpdata --help
Usage: ./manage.py dumpdata [options] [appname appname.ModelName ...]
and there's the exclude switch:
-e EXCLUDE, --exclude=EXCLUDE
An appname or appname.ModelName to exclude (use
multiple --exclude to exclude multiple apps/models).