Refactoring: how to remove a model?

Refactoring: how to remove a model? - django

I have a model which is causing too much complexity, and so I want to do away with it and move to a simpler way of doing things. I don't immediately want to scrap the data in this database table, though.
class PRSblock( models.Model):
PRS = models.ForeignKey( 'jobs.PRS2', models.CASCADE, related_name='prs_blocks')
# no other relational fields
So, first, migrate the related name prs_blocks to obsolete_prs_blocks and then in the PRS model, add a #property prs_blocks that will assert that it is never called (to trap any bits of code which I failed to remove)
Second, rename the model PRSblock to obsolete_PRSblock. IIRC Django makemigrations will ask whether I renamed it, and if I say yes, it will preserve the database table.
Does this sound sensible or are there any gotchas I haven't though of?

Related

Should I use JSONField over ForeignKey to store data?

I'm facing a dilemma, I'm creating a new product and I would not like to mess up the way I organise the informations in my database.
I have these two choices for my models, the first one would be to use foreign keys to link my them together.
Class Page(models.Model):
data = JsonField()
Class Image(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
Class Video(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
etc...
The second is to keep everything in Page's JSONField:
Class Page(models.Model):
data = JsonField() # videos and pictures, etc... are stored here
Is one better than the other and why? This would be a huge help on the way I would organize my databases in the futur.
I thought maybe the second option could be slower since everytime something changes all the json would be overridden, but does it make a huge difference or is what I am saying false?

A JSONField obfuscates the underlying data, making it difficult to write readable code and fully use Django's built-in ORM, validations and other niceties (ModelForms for example). While it gives flexibility to save anything you want to the db (e.g. no need to migrate the db when adding new fields), it takes away the clarity of explicit fields and makes it easy to introduce errors later on.
For example, if you start saving a new key in your data and then try to access that key in your code, older objects won't have it and you might find your app crashing depending on which object you're accessing. That can't happen if you use a separate field.
I would always try to avoid it unless there's no other way.
Typically I use a JSONField in two cases:
To save a response from 3rd party APIs (e.g. as an audit trail)
To save references to archived objects (e.g. when the live products in my db change but I still have orders referencing the product).
If you use PostgreSQL, as a relational database, it's optimised to be super-performant on JOINs so using ForeignKeys is actually a good thing. Use select_related and prefetch_related in your code to optimise the number of queries made, but the queries themselves will scale well even for millions of entries.

Do Django models really need a single unique key field

Some of my models are only unique in a combination of keys. I don't want to use an auto-numbering id as the identifier as subsets of the data will be exported to other systems (such as spreadsheets), modified and then used to update the master database.
Here's an example:
class Statement(models.Model):
supplier = models.ForeignKey(Supplier)
total = models.DecimalField("statement total", max_digits=10, decimal_places=2)
statement_date = models.DateField("statement date")
....
class Invoice(models.Model):
supplier = models.ForeignKey(Supplier)
amount = models.DecimalField("invoice total", max_digits=10, decimal_places=2)
invoice_date = models.DateField("date of invoice")
statement = models.ForeignKey(Statement, blank=True, null=True)
....
Invoice records are only unique for a combination of supplier, amount and invoice_date
I'm wondering if I should create a slug for Invoice based on supplier, amount and invoice_date so that it is easy to identify the correct record.
An example of the problem of having multiple related fields to identify the right record is django-csvimport which assumes there is only one related field and will not discriminate on two when building the foreign key links.
Yet the slug seems a clumsy option and needs some kind of management to rebuild the slugs after adding records in bulk.
I'm thinking this must be a common problem and maybe there's a best practice design pattern out there somewhere.
I am using PostgreSQL in case anyone has a database solution. Although I'd prefer to avoid that if possible, I can see that it might be the way to build my slug if that's the way to go, perhaps with trigger functions. That just feels a bit like hidden functionality though, and may cause a headache for setting up on a different server.
UPDATE - after reading initial replies
My application requires that data may be exported, modified remotely, and merged back into the master database after review and approval. Hidden autonumber keys don't easily survive that consistently. The relation invoices[2417] is part of statements[265] is not persistent if the statement table was emptied and reloaded from a CSV.
If I use the numeric autonumber pk then any process that is updating the database would need to refresh the related key numbers or by using the multiple WITH clause.
If I create a slug that is based on my 3 keys but easy to reproduce then I can use it as the key - albeit clumsily. I'm thinking of a slug along the lines:
u'%s %s %s' % (self.supplier,
self.statement_date.strftime("%Y-%m-%d"),
self.total)
This seems quite clumsy and not very DRY as I expect I may have to recreate the slug elsewhere duplicating the algorithm (maybe in an Excel formula, or an Access query)
I thought there must be a better way I'm missing but it looks like yuvi's reply means there should be, and there will be, but not yet :-(

What you're talking about it a multi-column primary key, otherwise known as "composite" or "compound" keys. Support in django for composite keys today is still in the works, you can read about it here:
Currently Django models only support a single column in this set,
denying many designs where the natural primary key of a table is
multiple columns [...] Current state is that the issue is
accepted/assigned and being worked on [...]
The link also mentions a partial implementation which is django-compositekeys. It's only partial and will cause you trouble with navigating between relationships:
support for composite keys is missing in ForeignKey and
RelatedManager. As a consequence, it isn't possible to navigate
relationships from models that have a composite primary key.
So currently it isn't entirely supported, but will be in the future. Regarding your own project, you can make of that what you will, though my own suggestion is to stick with the fully supported default of a hidden auto-incremented field that you don't even need to think about (and use unique_together to enforce the uniqness of the described fields instead of making them your primary keys).
I hope this helps!

No.
Model needs to have one field that is primary_key = True. By default this is the (hidden) autofield which stores object Id. But you can set primary_key to True at any other field.
I've done this in cases, Where i'm creating django project upon tables which were previously created manually or through some other frameworks/systems.
In reality - you can use whatever means you can think of, for joining objects together in queries. As long as query returns bunch of data that can be associated with models you have - it does not really matter which field you are using for joins. Just keep in mind, that the solution you use should be as effective as possible.
Alan

Are there performance advantages by splitting a Django model/table into two models/tables?

In SO question 7531153, I asked the proper way to split a Django model into two—either using Django's Multi-table Inheritance or explicitly defining a OneToOneField.
Based Luke Sneeringer's comment, I'm curious if there's a performance gain from splitting the model in two.
The reason I was thinking about splitting the model in two is because I have some fields that will always be completed, while there are other fields that will typically be empty (until the project is closed).
Are there performance gains from putting typically empty fields, such as actual_completion_date and actual_project_costs, into a separate model/table in Django?
Split into Two Models
class Project(models.Model):
project_number = models.SlugField(max_length=5, blank=False,
primary_key=True)
budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
submitted_on = models.DateField(auto_now_add=True)
class ProjectExtendedInformation(models.Model):
project = models.OneToOneField(CapExProject, primary_key=True)
actual_completion_date = models.DateField(blank=True, null=True)
actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
blank=True, null=True)

Actually, quite the opposite. Any time multiple tables are involved, a SQL JOIN will be required, which is inherently slower for a database to perform than a simple SELECT query. The fact that the fields are empty is meaningless in terms of performance one way or another.
Depending on the size of the table and the number of columns, it may be faster to only select a subset of fields that you need to interact with, but that's easy enough in Django with the only method:
Project.objects.only('project_number', 'budgeted_costs', 'submitted_on')
Which produces something akin to:
SELECT ('project_number', 'budgeted_costs', 'submitted_on') FROM yourapp_project;
Using separate models (and tables) only makes sense for the purposes of modularization -- such that you subclass Project to create a specific kind of project that requires additional fields but still needs all the fields of a generic Project.

For your case, if there's some info that's available only when it's closed, I'd indeed advise making a separate model.
Joins aren't bad. Especially in your case the join will be faster if you have all rows in one table and much fewer rows in the other one. I've worked with databases a lot, and in most cases it's a pure guess to tell if a join will be better or worse. Even a full table scan is better than using an index in many cases. You need to look at the EXPLAINs, if performance is a concern, and profile the Db work if possible (I know Oracle supports this.) But before performance becomes an issue, I prefer quicker development.
We have a table in Django with 5M rows. And we needed a column that would have been not null only for 1K rows. Just altering the table would have taken half a day. Rebuilding from scratch also takes a few hours. We've chosen to make a separate model.
I've been to a lecture on Domain Driven Design in which the author explained that it is important, especially in development of a new app, to separate models, to not stuff everything in one class.
Let's say you have a CargoAircraft class and PassengerAircraft. It's so tempting to put them in one class and work "seamlessly", isn't it? But interactions with them (scheduling, booking, weight or capacity calculations) are completely different.
So, by putting everything in one class you force yourself to bunch of IF clauses in every method, to extra methods in Manager, to harder debugging, to bigger tables in the DB. Basically you make yourself spend more time developing for the sake of what? For only two things: 1) fewer joins 2) fewer class names.
If you separate the classes, things go much easier:
clean code, no ugly ifs, no .getattr and defaults
easy debugging
more mainainable database
hence, faster development.

What are the advantages of using ForeignKey in Django?

This is an extremely naive question. As you can tell, it comes from someone who doesn't know much about either databases or Django.
What are the advantages of using ForeignKeys in Django?
Maybe an example will help me understand better. I have tables like this already:
City:
id = IntegerField() # e.g. 15
name = CharField() # e.g. 'Rome'
Country:
name = CharField() e.g. 'Italy'
capital = IntegerField() # e.g 15
Should I bother changing capital to ForeignKey(City), and if so, why? Do certain things become quicker, more convenient, or otherwise better?
thanks!

Foreign keys are constraints on your data model that allow you to ensure consistency. Basically, in your example, if you didn't have capital as a ForeignKey to a City it could potentially contain an id of a City that didn't exist! When you use ForeignKey instead, it places a constraint on the database so that you cannot remove things that are currently referenced by other things. So if you tried to delete the City named "Rome" before deleting the Country named "Italy" who has that city as its capital, it wouldn't allow you to.
Using ForeignKey instead would make sure you never had to worry about whether or not the "thing" on the other end of the relationship was still there or not.

Using ForeignKeys enables Django to "know" about the relations. So you can follow the relations without taking care about how they are stored, eg you can do the following without having to care how the relation is stored:
country = Country.objects.get(pk=1)
print country.capital.name
italy = Country.objects.get(capital__name="Rome")
Also for keeping constraints Django will follow the relations when deleting objects that are referenced by a ForeignKey. Django also keeps track of the reverse relationships (without needing to explicitly define them), so you can do something like countries = rome.country_set.all() (which makes not so much sense in this example, since it would make more sense to use a OneToOneField here...

Referential integrity. OTOH it is quite possible to have a database that neither knows nor cares about FKs - in fact I work with a legacy db like this at work.

Designing a database for a user/points system? (in Django)

First of all, sorry if this isn't an appropriate question for StackOverflow. I've tried to make it as generalisable as possible.
I want to create a database (MySQL, site running Django) that has users, who can be allocated a certain number of points for various types of action - it's a collaborative game. My requirements are to obtain:
the number of points a user has
the user's ranking compared to all other users
and the overall leaderboard (i.e. all users ranked in order of points)
This is what I have so far, in my Django models.py file:
class SiteUser(models.Model):
name = models.CharField(max_length=250 )
email = models.EmailField(max_length=250 )
date_added = models.DateTimeField(auto_now_add=True)
def points_total(self):
points_added = PointsAdded.objects.filter(user=self)
points_total = 0
for point in points_added:
points_total += point.points
return points_total
class PointsAdded(models.Model):
user = models.ForeignKey('SiteUser')
action = models.ForeignKey('Action')
date_added = models.DateTimeField(auto_now_add=True)
def points(self):
points = Action.objects.filter(action=self.action)
return points
class Action(models.Model):
points = models.IntegerField()
action = models.CharField(max_length=36)
However it's rapidly becoming clear to me that it's actually quite complex (in Django query terms at least) to figure out the user's ranking and return the leaderboard of users. At least, I'm finding it tough. Is there a more elegant way to do something like this?
This question seems to suggest that I shouldn't even have a separate points table - what do people think? It feels more robust to have separate tables, but I don't have much experience of database design.

this is old, but I'm not sure exactly why you have 2 separate tables (Points Added & Action). It's late, so maybe my mind isn't ticking, but it seems like you just separated one table into 2 for some reason. It doesn't seem like you get any benefit out of it. It's not like there's a 1 to many relationship in it right?
So first of all, I would combine those two tables. Secondly, you are probably better off storing points_total into a value in your site_user table. This is what I think Demitry is trying to allude to, but didn't say explicitly. This way instead of doing this whole additional query (pulling everything a user has done in his history of the site is expensive) + looping action (going through it is even more expensive), you can just pull it as one field. It's denormalizing the data for a greater good.
Just be sure to update the value everytime you add in something that has points. You can use django's post_save signal to do that

It's a bit more difficult to have points saved in the same table, but it's totally worth it. You can do very simple ordering/filtering operations if you have computed points total on user model. And you can count totals only when something changes (not every time you want to show them). Just put some validation logic into post_save signals and make sure to cover this logic with tests and you're good.
p.s. denormalization on wiki.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Refactoring: how to remove a model? - django

Related

Should I use JSONField over ForeignKey to store data?

Do Django models really need a single unique key field

Are there performance advantages by splitting a Django model/table into two models/tables?

What are the advantages of using ForeignKey in Django?

Designing a database for a user/points system? (in Django)

Categories

Resources