Chunking a django-import-export - django

I am reading this article about chunking a large database operation. I am also using django-import-export and django-import-export-celery in my admin site and I would like to integrate chunking into them.
The problem I have is django-import-export already handles the file import, as well as the whole process of importing, in the background.
I tried using django-import-export's bulk imports, but one of the caveats is:
Bulk operations do not work with many-to-many relationships.
so chunking is what we thought was the alternative. Is it possible to perform chunking inside the django-import-export?
UPDATE - ADDED MODEL:
class Profile(models.Model):
firstname = models.CharField(max_length=200)
lastname = models.CharField(max_length=200, blank=True, null=True)
email = models.EmailField(max_length=200)
associated_issuer = models.ManyToManyField('app.Issuer', related_name='add_issuer', blank=True)
associated_profile = models.ManyToManyField('app.Profile', related_name='add_profile', blank=True)
class Issuer(models.Model):
# personal information
name = models.CharField(max_length=200, blank=True, null=True)
contact_email = models.EmailField(max_length=200, blank=True, null=True)
associated_profile = models.ManyToManyField('app.Profile', related_name='add_profile_2', blank=True)
associated_issuer = models.ManyToManyField('app.Issuer', related_name='add_issuer_2', blank=True)

Related

django multilevel nested formsets

django provide inline formset which allow to 3rd level of nesting, but I need much more complex nesting. It should be fully dynamic, so I can go one to one on each level, but it could be one to many on each level. So far I have this only, but could be expanded for additional sublevels.
class Srts(models.Model):
data = models.CharField(max_length=10, blank=True, null=True)
class Volume(models.Model):
srts = models.ForeignKey('Srts', on_delete=models.CASCADE)
name = models.CharField(max_length=120, blank=True, null=True)
class Qtree(models.Model):
volume = models.ForeignKey('Volume', on_delete=models.CASCADE)
name = models.CharField(max_length=120)
class Server(models.Model):
qtree = models.ForeignKey('Qtree', on_delete=models.CASCADE)
hostname = models.CharField(max_length=120, blank=True, null=True)
class CifsPermission(models.Model):
qtree = models.ForeignKey('Qtree', on_delete=models.CASCADE)
group = models.CharField(max_length=30, blank=True, null=True, default='None')
permission = models.CharField(max_length=30, blank=True, null=True, default='None')
I have been googling a lot last days, but there is not much.
Some examples
django-nested-inline-formsets-example -that basic only 3rd level
Django-better forms -could handle multiple forms on one submit, but not formsets
django-nested-inline -only for admin page
Shoudl be the way to work with not model related form , then do some separation and appropriate logic and then save it to models?
can't add image, some sever error ocured, so giving the link directly
https://imgur.com/a/NQBR6tJ
I would like to to something simular over normal view, not admin view.

Django One to Many Loose Relationship

Backstory
Data is being pulled from an accounting system that can have department, market, and client relationship data associated with it. The relationship data are all TEXT/CHAR fields. They are not integer columns. There are over 2 million rows.
The problem
The problem I've been running into is how to add the lines without relationship validation that could fail because the related table (like Market) is missing the value or has been changed (because we are looking far back in the past). The naming of the columns in the Django database (more detail below), and querying Django models with a join that don't have a relationship attribute on the class.
models.py
from typing import Optional
from django.db import models
class Line(models.Model):
entry_number: int = models.IntegerField(db_index=True)
posting_date: date = models.DateField()
document_number: Optional[str] = models.CharField(max_length=150, null=True, default=None)
description: Optional[str] = models.CharField(max_length=150, null=True, default=None)
department: Optional[str] = models.CharField(max_length=150, null=True, default=None)
market: Optional[str] = models.CharField(max_length=150, null=True, default=None)
amount: Decimal = models.DecimalField(max_digits=18, decimal_places=2)
client: Optional[str] = models.CharField(max_length=150, null=True, default=None)
# This relationship works
account = models.ForeignKey(Account, on_delete=models.DO_NOTHING, related_name='lines')
class Department(models.Model):
code: str = models.CharField(max_length=10, db_index=True, unique=True, primary_key=True)
name: str = models.CharField(max_length=100, null=True)
class Market(models.Model):
code: str = models.CharField(max_length=10, db_index=True, unique=True, primary_key=True)
name: str = models.CharField(max_length=100, null=True)
The data for the line is filled in the a sql statement grabbing the data from the accounting system.
In SqlAlchemy
What I am looking for is something like this which can be represented in sqlalchemy.
class Line(...):
# snipped
client_rel: Client = relationship("Client", primaryjoin=client == foreign(Client.code), viewonly=True)
department_rel: Department = relationship("Department", primaryjoin=department == foreign(Department.code), viewonly=True)
market_rel: Market = relationship("Market", primaryjoin=market == foreign(Market.code), viewonly=True)
# snipped the other related classes
The SQLAlchemy code allows for primaryjoins which are relationships only enforced at the python level not in the database. It also allows for the relatanship's loose foreign key to be named almost anything. As far as I can tell Django's foreign key column has to be "name"_id and can't be changed.

Django Many to Many Data Duplication?

Background
I'm storing data about researchers. eg, researcher profiles, metrics for each researcher, journals they published in, papers they have, etc.
The Problem
My current database design is this:
Each Researcher has many journals (they published in). The journals have information about it.
Likewise for Subject Areas
But currently, this leads to massive data duplication. Eg, the same journal can appear many times in the Journal table, just linked to a different researcher, etc.
Is there any better way to tackle this problem? Like right now, I have over 5000 rows in the journal column but only about 1000 journals.
Thank you!
EDIT: This is likely due to the way im saving the models for new data (mentioned below). Could anyone provide the proper way to loop and save hashes to models?
Model - Researcher
class Researcher(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
scopus_id = models.BigIntegerField(db_index=True) # Index to make searches quicker
academic_rank = models.CharField(max_length=100)
title = models.CharField(max_length=200,default=None, blank=True, null=True)
salutation = models.CharField(max_length=200,default=None, blank=True, null=True)
scopus_first_name = models.CharField(max_length=100)
scopus_last_name = models.CharField(max_length=100)
affiliation = models.CharField(default=None, blank=True, null=True,max_length = 255)
department = models.CharField(default=None, blank=True, null=True,max_length = 255)
email = models.EmailField(default=None, blank=True, null=True)
properties = JSONField(default=dict)
def __str__(self):
return "{} {}, Scopus ID {}".format(self.scopus_first_name,self.scopus_last_name,self.scopus_id)
Model - Journal
class Journal(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
researchers = models.ManyToManyField(Researcher)
title = models.TextField()
journal_type = models.CharField(max_length=40,default=None,blank=True, null=True)
abbreviation = models.TextField(default=None, blank=True, null=True)
issn = models.CharField(max_length=50, default=None, blank=True, null=True)
journal_rank = models.IntegerField(default=None, blank=True, null=True)
properties = JSONField(default=dict)
def __str__(self):
return self.title
How I'm currently saving them:
db_model_fields = {'abbreviation': 'Front. Artif. Intell. Appl.',
'issn': '09226389',
'journal_type': 'k',
'researchers': <Researcher: x, Scopus ID f>,
'title': 'Frontiers in Artificial Intelligence and Applications'}
# remove researchers or else create will fail (some id need to exist error)
researcher = db_model_fields["researchers"]
del db_model_fields["researchers"]
model_obj = Journal(**db_model_fields)
model_obj.save()
model_obj.researchers.add(researcher)
model_obj.save()
Here is how it works :
class Journal(models.Model):
# some fields
class Researcher(models.Model):
# some fields
journal = models.ManyToManyField(Journal)
Django gonna create a relation table :
Behind the scenes, Django creates an intermediary join table to represent the many-to-many relationship
So you'll have many rows in this table, which is how it works, but journal instance and researcher instance in THEIR table will be unique.
Your error is maybe coming from how you save. Instead of :
model_obj = Journal(**db_model_fields)
model_obj.save()
Try to just do this:
model_obj = Journal.objects.get_or_create(journal_id)
This way you'll get it if it already exists. As none of your fields are unique, you're creating new journal but there's no problem cause django is generating unique ID each time you add a new journal.

django prefetch_related not working

I am trying to export all my database with a prefetch_related but I only get data from the main model.
My models:
class GvtCompoModel(models.Model):
gvtCompo= models.CharField(max_length=1000, blank=False, null=False)
...
class ActsIdsModel(models.Model):
year = models.IntegerField(max_length=4, blank=False, null=False)
...
class RespProposModel(models.Model):
respPropos=models.CharField(max_length=50, unique=True)
nationResp = models.ForeignKey('NationRespModel', blank=True, null=True, default=None)
nationalPartyResp = models.ForeignKey('NationalPartyRespModel', blank=True, null=True, default=None)
euGroupResp = models.ForeignKey('EUGroupRespModel', blank=True, null=True, default=None)
class ActsInfoModel(models.Model):
#id of the act
actId = models.OneToOneField(ActsIdsModel, primary_key=True)
respProposId1=models.ForeignKey('RespProposModel', related_name='respProposId1', blank=True, null=True, default=None)
respProposId2=models.ForeignKey('RespProposModel', related_name='respProposId2', blank=True, null=True, default=None)
respProposId3=models.ForeignKey('RespProposModel', related_name='respProposId3', blank=True, null=True, default=None)
gvtCompo= models.ManyToManyField(GvtCompoModel)
My view:
dumpDB=ActsInfoModel.objects.all().prefetch_related("actId", "respProposId1", "respProposId2", "respProposId3", "gvtCompo")
for act in dumpDB.values():
for field in act:
print "dumpDB field", field
When I display "field", I see the fields from ActsInfoModel ONLY, the starting model. Is it normal?
You haven't understood the arguments to prefetch_related. It's not a list of fields, but a list of models.
(Note that your field naming convention is also very misleading - respProposId1 and actId are not IDs, but actual instances of the models. Django has created an underlying field in each case by appending _id, so the db columns are respProposId1_id and actId_id. You should just call the fields resp_propos1 and resp_propos2 - also note that normal style is lower_case_with_underscore, not capWords.)
It is normal, that you are seeing fields from ActsInfoModel only. You can access related models via dot notation, like:
acts = ActsInfoModel.objects.all().prefetch_related("actId", "respProposId1", "respProposId2", "respProposId3", "gvtCompo")
for act in acts:
print act.respProposId1.respPropos
Related models are already prefetched, so it won't produce any additional queries. FYI, quote from docs:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.

Django model: manytomany with more than one object

I have an Event model. Events can have many 'presenters'. But each presenter can either 1 of 2 different types of profiles. Profile1 and Profile2. How do I allow both profiles to go into presenters?
This will be 100% backend produced. As to say, admin will be selecting "presenters".
(Don't know if that matters or not).
class Profile1(models.Model):
user = models.ForeignKey(User, null=True, unique=True)
first_name = models.CharField(max_length=20, null=True, blank=True)
last_name = models.CharField(max_length=20, null=True, blank=True)
created = models.DateTimeField(auto_now_add=True)
modified = models.DateTimeField(auto_now=True)
about = models.TextField(null=True, blank=True)
tags = models.ManyToManyField(Tag, null=True, blank=True)
country = CountryField()
avatar = models.ImageField(upload_to='avatars/users/', null=True, blank=True)
score = models.FloatField(default=0.0, null=False, blank=True)
organization = models.CharField(max_length=2, choices=organizations)
class Profile2(models.Model):
user = models.ForeignKey(User, null=True, unique=True)
first_name = models.CharField(max_length=20, null=True, blank=True)
last_name = models.CharField(max_length=20, null=True, blank=True)
created = models.DateTimeField(auto_now_add=True)
modified = models.DateTimeField(auto_now=True)
about = models.TextField(null=True, blank=True)
tags = models.ManyToManyField(Tag, null=True, blank=True)
country = CountryField()
avatar = models.ImageField(upload_to='avatars/users/', null=True, blank=True)
score = models.FloatField(default=0.0, null=False, blank=True)
...
class Event(models.Model):
title = models.CharField(max_length=200)
sub_heading = models.CharField(max_length=200)
presenters = ManyToManyField(Profile1, Profile2, blank=True, null=True) ?
...
# I've also tried:
profile1_presenters = models.ManyToManyField(Profile1, null=True, blank=True)
profile2_presenters = models.ManyToManyField(Profile2, null=True, blank=True)
# is there a better way to accomplish this?...
I think you have a desing problem here. In my opinion, you must think what is a Presenter and what's the different between a Presenter with "profile 1" and with "profile 2". What are you going to do with this models? Are you sure there are just two profiles? Is there any chance that, in some time from now, a different profile ("profile 3") appears? And profile 4? and profile N?
I recommend you to think again about your models and their relations. Do NOT make this decision thinking of how difficul/easy will be to handle these models from django admin. That's another problem and i'll bet that if you think your models a little bit, this won't be an issue later.
Nevertheless, i can give you some advice of how to acomplish what you want (or i hope so). Once you have think abount how to model these relations, start thinking on how are you going to write your models in django. Here are some questions you will have to answer to yourself:
Do you need one different table (if you are going to use SQL) per profile?
If you cannot answer that, try to answer these:
1) What's the difference between two different profiles?
2) Are there more than one profile?
3) Each presenter have just one profile? What are the chances that this property changes in near future?
I don't know a lot about what you need but i think the best option is to have a model "Profile" apart of your "Presenter" model. May be something like:
class Profile(models.Model):
first_profile_field = ...
second_profile_field = ...
# Each presenter have one profile. One profile can "represent"
# to none or more presenters
class Presenter(models.Model):
first_presenter_field = ....
second_presenter_field = ....
profile = models.ForeignKey(Profile)
class Event(models.Model):
presenters = models.ManyToManyField(Presenter)
....
This is just an idea of how i imagine you could design your model. Here are some links that may help you once you have design your models correctly and have answered the questions i made to you:
https://docs.djangoproject.com/en/dev/topics/db/models/#model-inheritance
https://docs.djangoproject.com/en/dev/misc/design-philosophies/#models
http://www.martinfowler.com/eaaCatalog/activeRecord.html
And to work with the admin once you decide how your design will be:
https://docs.djangoproject.com/en/dev/ref/contrib/admin/
EDIT:
If i'm not wrong, the only difference between profile 1 and 2 fields is the "organization" field. Am i right? So i recommend you to merge both models since they are almost the same. If they have different methods, or you want to add different managers or whatever, you can use the proxy option of django models. For example, you can do this:
class Profile(models.Model):
#All the fields you listed above, including the "organization" field
class GoldenProfile(models.Model):
#you can define its own managers
objects = GoldenProfileManager()
....
class Meta:
proxy = True
class SilverProfile(models.Model):
....
class Meta:
proxy = True
This way, you can define different methods or the same method with a different behaviour in each model. You can give them their own managers, etcetera.
And the event class should stay like this:
class Event(models.Model):
title = models.CharField(max_length=200)
sub_heading = models.CharField(max_length=200)
presenters = ManyToManyField(Profile, blank=True, null=True)
Hope it helps!