Backstory
Data is being pulled from an accounting system that can have department, market, and client relationship data associated with it. The relationship data are all TEXT/CHAR fields. They are not integer columns. There are over 2 million rows.
The problem
The problem I've been running into is how to add the lines without relationship validation that could fail because the related table (like Market) is missing the value or has been changed (because we are looking far back in the past). The naming of the columns in the Django database (more detail below), and querying Django models with a join that don't have a relationship attribute on the class.
models.py
from typing import Optional
from django.db import models
class Line(models.Model):
entry_number: int = models.IntegerField(db_index=True)
posting_date: date = models.DateField()
document_number: Optional[str] = models.CharField(max_length=150, null=True, default=None)
description: Optional[str] = models.CharField(max_length=150, null=True, default=None)
department: Optional[str] = models.CharField(max_length=150, null=True, default=None)
market: Optional[str] = models.CharField(max_length=150, null=True, default=None)
amount: Decimal = models.DecimalField(max_digits=18, decimal_places=2)
client: Optional[str] = models.CharField(max_length=150, null=True, default=None)
# This relationship works
account = models.ForeignKey(Account, on_delete=models.DO_NOTHING, related_name='lines')
class Department(models.Model):
code: str = models.CharField(max_length=10, db_index=True, unique=True, primary_key=True)
name: str = models.CharField(max_length=100, null=True)
class Market(models.Model):
code: str = models.CharField(max_length=10, db_index=True, unique=True, primary_key=True)
name: str = models.CharField(max_length=100, null=True)
The data for the line is filled in the a sql statement grabbing the data from the accounting system.
In SqlAlchemy
What I am looking for is something like this which can be represented in sqlalchemy.
class Line(...):
# snipped
client_rel: Client = relationship("Client", primaryjoin=client == foreign(Client.code), viewonly=True)
department_rel: Department = relationship("Department", primaryjoin=department == foreign(Department.code), viewonly=True)
market_rel: Market = relationship("Market", primaryjoin=market == foreign(Market.code), viewonly=True)
# snipped the other related classes
The SQLAlchemy code allows for primaryjoins which are relationships only enforced at the python level not in the database. It also allows for the relatanship's loose foreign key to be named almost anything. As far as I can tell Django's foreign key column has to be "name"_id and can't be changed.
Related
I had a database in php/html using MySQL and am transferring this to a Django project.
I have all the functionalities working, but loading a table of the data I want is immensely slow because of the relations with other tables.
After searching for days I know that I probably have to use a model.Manager to use prefetch_all. However, I am not stuck on how to call this into my template.
I have the following models(simplified):
class OrganisationManager(models.Manager):
def get_queryset_director(self):
person_query = Position.objects.select_related('person').filter(position_current=True,
position_type="director"
)
return super().get_queryset().prefetch_related(Prefetch('position_set', queryset=person_query, to_attr="position_list"))
def get_queryset_president(self):
person_query = Position.objects.select_related('person').filter(position_current=True,
position_type="president"
)
return super().get_queryset().prefetch_related(Prefetch('position_set', queryset=person_query, to_attr="position_list"))
class Person(models.Model):
full_name = models.CharField(max_length=255, blank=True, null=True)
country = models.ForeignKey(Country, models.CASCADE, blank=True, null=True)
birth_date = models.DateField(blank=True, null=True)
class Organisation(models.Model):
organisation_name = models.CharField(max_length=255, blank=True, null=True)
positions = models.ManyToManyField(Person, through='Position')
# positions are dynamic, even though there should only be only one director and president at each given time, a onetoone model wouldn't work in this scenario
objects = OrganisationManager()
# The following defs are currently used to show the names and start dates of the director and president in the detailview and listview
def director(self):
return self.position_set.filter(position_current=True, position_type="director").last()
def president(self):
return self.position_set.filter(position_current=True, position_type="P").last()
class Position(models.Model):
POSITION_TYPES = (
('president','President'),
('director','Director'),
)
person = models.ForeignKey(Person, on_delete=models.CASCADE)
organisation = models.ForeignKey(Organisation, on_delete=models.CASCADE)
position_type = models.CharField(max_length=255, choices=POSITION_TYPES, blank=True, null=True)
position_current = models.BooleanField(default=False)
position_start = models.CharField(max_length=10, blank=True, null=True)
I want my table to look like this:
Organisation Name
President
President Start Date
Director
Director Start Date
Organisation 1
President of org 1
2013
Director of org 1
2015
Organisation 2
President of org 2
2018
Director of org 2
2017
With the code I currently have, it all works great. But because it has to call the database each time, this even causes Heroku to timeout.
I don't understand how to use the prefetch query in the models.Manager in the table (ListView) template. Thanks!
One approach to achieve the results you want is to use subqueries. So something like:
president_subquery = Position.objects.filter(
organisation=OuterRef('pk'), position_type='president', position_current=True
).last()
director_subquery = Position.objects.filter(
organisation=OuterRef('pk'), position_type='director', position_current=True
).last()
Organisation.objects.annotate(
president=Subquery(president_subquery.values('person__fullname')),
president_start_date=Subquery(president_subquery.values('position_start')),
director=Subquery(director_subquery.values('person__fullname')),
director_start_date=Subquery(director_subquery.values('position_start')),
)
Example:
class Room(models.Model):
assigned_floor = models.ForeignKey(Floor, null=True, on_delete=models.CASCADE)
room_nr = models.CharField(db_index=True, max_length=4, unique=True, null=True)
locked = models.BooleanField(db_index=True, default=False)
last_cleaning = models.DateTimeField(db_index=True, auto_now_add=True, null=True)
...
class Floor(models.Model):
assigned_building = models.ForeignKey(Building, on_delete=models.CASCADE)
wall_color = models.CharField(db_index=True, max_length=255, blank=True, null=True)
...
class Building(models.Model):
name = models.CharField(db_index=True, max_length=255, unique=True, null=True)
number = models.PositiveIntegerField(db_index=True)
color = models.CharField(db_index=True, max_length=255, null=True)
...
I want to output all rooms in a table sorted by Building.number.
Data which I want to print for each room:
Building.number, Building.color, Building.name, Floor.wall_color, Room.last_cleaning
Furthermore I want to allow optional filters:
Room.locked, Room.last_cleaning, Floor.wall_color, Building.number, Building.color
With one table it's no Problem for me, but I don't know how I archive this with three tables.
kwargs = {'number': 123}
kwargs['color'] = 'blue'
all_buildings = Building.objects.filter(**kwargs).order_by('-number')
Can you please help me? Do I need write raw SQL queries or can I archive this with the Django model query APIs?
I'm using the latest Django version with PostgreSQL.
No raw sql needed:
room_queryset = Room.objects.filter(assigned_floor__wall_color='blue')
^^
# A double unterscore declares the following attribute to be a field of the object referenced in the foregoing foreign key field.
for room in room_queryset:
print(room.assigned_floor.assigned_building.number)
print(room.assigned_floor.assigned_building.color)
print(room.assigned_floor.assigned_building.name)
print(room.assigned_floor.wall_color)
print(room.last_cleaning)
Background
I'm storing data about researchers. eg, researcher profiles, metrics for each researcher, journals they published in, papers they have, etc.
The Problem
My current database design is this:
Each Researcher has many journals (they published in). The journals have information about it.
Likewise for Subject Areas
But currently, this leads to massive data duplication. Eg, the same journal can appear many times in the Journal table, just linked to a different researcher, etc.
Is there any better way to tackle this problem? Like right now, I have over 5000 rows in the journal column but only about 1000 journals.
Thank you!
EDIT: This is likely due to the way im saving the models for new data (mentioned below). Could anyone provide the proper way to loop and save hashes to models?
Model - Researcher
class Researcher(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
scopus_id = models.BigIntegerField(db_index=True) # Index to make searches quicker
academic_rank = models.CharField(max_length=100)
title = models.CharField(max_length=200,default=None, blank=True, null=True)
salutation = models.CharField(max_length=200,default=None, blank=True, null=True)
scopus_first_name = models.CharField(max_length=100)
scopus_last_name = models.CharField(max_length=100)
affiliation = models.CharField(default=None, blank=True, null=True,max_length = 255)
department = models.CharField(default=None, blank=True, null=True,max_length = 255)
email = models.EmailField(default=None, blank=True, null=True)
properties = JSONField(default=dict)
def __str__(self):
return "{} {}, Scopus ID {}".format(self.scopus_first_name,self.scopus_last_name,self.scopus_id)
Model - Journal
class Journal(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
researchers = models.ManyToManyField(Researcher)
title = models.TextField()
journal_type = models.CharField(max_length=40,default=None,blank=True, null=True)
abbreviation = models.TextField(default=None, blank=True, null=True)
issn = models.CharField(max_length=50, default=None, blank=True, null=True)
journal_rank = models.IntegerField(default=None, blank=True, null=True)
properties = JSONField(default=dict)
def __str__(self):
return self.title
How I'm currently saving them:
db_model_fields = {'abbreviation': 'Front. Artif. Intell. Appl.',
'issn': '09226389',
'journal_type': 'k',
'researchers': <Researcher: x, Scopus ID f>,
'title': 'Frontiers in Artificial Intelligence and Applications'}
# remove researchers or else create will fail (some id need to exist error)
researcher = db_model_fields["researchers"]
del db_model_fields["researchers"]
model_obj = Journal(**db_model_fields)
model_obj.save()
model_obj.researchers.add(researcher)
model_obj.save()
Here is how it works :
class Journal(models.Model):
# some fields
class Researcher(models.Model):
# some fields
journal = models.ManyToManyField(Journal)
Django gonna create a relation table :
Behind the scenes, Django creates an intermediary join table to represent the many-to-many relationship
So you'll have many rows in this table, which is how it works, but journal instance and researcher instance in THEIR table will be unique.
Your error is maybe coming from how you save. Instead of :
model_obj = Journal(**db_model_fields)
model_obj.save()
Try to just do this:
model_obj = Journal.objects.get_or_create(journal_id)
This way you'll get it if it already exists. As none of your fields are unique, you're creating new journal but there's no problem cause django is generating unique ID each time you add a new journal.
I'm working to remove an existing GenericForeignKey relationship from some models. Id like to change it to the Reformatted Model below. Does migrations provide a way to convert the existing content_type and object_ids to the respective new ForeignKey's? (to keep existing data). Basically brand new at programming, so pardon me if I'm asking a stupid question.
class Donation(models.Model):
amount_id = models.CharField(max_length=12, unique=True, editable=False)
date_issued=models.DateField(auto_now_add=True)
description=models.TextField(blank=True, null=True)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type','object_id')
class Individual(BaseModel):
first_name = models.CharField(max_length=50)
middle_name = models.CharField(max_length=50, blank=True,
null=True)
last_name = models.CharField(max_length=50)
suffix = models.CharField(max_length=50, blank=True, null=True)
contributor = generic.GenericRelation(Donation, related_query_name='individual')
class Organization(models.Model):
name = models.CharField(max_length=100)
contributor = generic.GenericRelation(Donation, related_query_name='organization')
Reformatted Model
class Donation(models.Model):
amount_id = models.CharField(max_length=12, unique=True, editable=False)
date_issued=models.DateField(auto_now_add=True)
description=models.TextField(blank=True, null=True)
contributor_group = models.ForeignKey(Organization, null=True, blank=True, on_delete=models.CASCADE)
contributor_individual = models.ForeignKey(Individual, null=True, blank=True, on_delete=models
Based on your model definition of Donation Model, one of fields contributor_group , contributor_model will always be Null post migration.
I hope you have taken that into you consideration.
Just to be safe Do this in two phases.
1. Keep the content_type and object_id and add the two new fields.
2. Next step remove the generic fields post data population.
There are two ways to populate those new fields
Django migrations provides you a way to populate new fields with values during the migrations. You can look it up. I haven't done that before either.
For more control and some learning as well. You can populate that via scripting as well. You can setup django-extensions module in your project. And write a script to do that population for you as well. A sample script would look like.
from myproject.models import Donation, Individual, Organization
from django.contrib.contenttypes.models import ContentType
def run():
organization_content_type = ContentType.objects.get_for_model(Organization)
individual_content_type = ContentType.obejcts.get_for_model(Individual)
donations = Donation.objects.all()
for donation in donations:
if donation.content_type_id == organization_content_type.id:
donation.contributor_group = donation.object_id
elif donation.content_type_id == individual_content_type.id:
donation.contributor_individual = donation.object_id
else:
print "Can't identify content type for donation id {}".format(donation.id)
donation.save()
Check the values are correct and then remove the generic fields.
Facing some issues with formatting here.
I am trying to export all my database with a prefetch_related but I only get data from the main model.
My models:
class GvtCompoModel(models.Model):
gvtCompo= models.CharField(max_length=1000, blank=False, null=False)
...
class ActsIdsModel(models.Model):
year = models.IntegerField(max_length=4, blank=False, null=False)
...
class RespProposModel(models.Model):
respPropos=models.CharField(max_length=50, unique=True)
nationResp = models.ForeignKey('NationRespModel', blank=True, null=True, default=None)
nationalPartyResp = models.ForeignKey('NationalPartyRespModel', blank=True, null=True, default=None)
euGroupResp = models.ForeignKey('EUGroupRespModel', blank=True, null=True, default=None)
class ActsInfoModel(models.Model):
#id of the act
actId = models.OneToOneField(ActsIdsModel, primary_key=True)
respProposId1=models.ForeignKey('RespProposModel', related_name='respProposId1', blank=True, null=True, default=None)
respProposId2=models.ForeignKey('RespProposModel', related_name='respProposId2', blank=True, null=True, default=None)
respProposId3=models.ForeignKey('RespProposModel', related_name='respProposId3', blank=True, null=True, default=None)
gvtCompo= models.ManyToManyField(GvtCompoModel)
My view:
dumpDB=ActsInfoModel.objects.all().prefetch_related("actId", "respProposId1", "respProposId2", "respProposId3", "gvtCompo")
for act in dumpDB.values():
for field in act:
print "dumpDB field", field
When I display "field", I see the fields from ActsInfoModel ONLY, the starting model. Is it normal?
You haven't understood the arguments to prefetch_related. It's not a list of fields, but a list of models.
(Note that your field naming convention is also very misleading - respProposId1 and actId are not IDs, but actual instances of the models. Django has created an underlying field in each case by appending _id, so the db columns are respProposId1_id and actId_id. You should just call the fields resp_propos1 and resp_propos2 - also note that normal style is lower_case_with_underscore, not capWords.)
It is normal, that you are seeing fields from ActsInfoModel only. You can access related models via dot notation, like:
acts = ActsInfoModel.objects.all().prefetch_related("actId", "respProposId1", "respProposId2", "respProposId3", "gvtCompo")
for act in acts:
print act.respProposId1.respPropos
Related models are already prefetched, so it won't produce any additional queries. FYI, quote from docs:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.