Django - Referential integrity on OneToOneField - django

I am trying to import a product feed (Product) into Django. I need to maintain a set of selected products (SelectedProduct) which also hold overrides for the product descriptions. I thought the best way to represent this is as a OneToOneField linking SelectedProduct with Product.
class Product(models.Model):
sku = models.CharField(max_length=32, primary_key=True)
title = models.CharField(max_length=200)
description = models.TextField(max_length=2000, blank=True, null=True)
class SelectedProduct(models.Model):
product = models.OneToOneField(Product, db_column='product_sku', on_delete=models.DO_NOTHING)
description = models.TextField(max_length=2000, blank=True, null=True)
For simplicity, each time the product feed is read, I am intending to delete all the products and re-import the whole product feed (within a transaction, so I can rollback if required).
However, I don't want to truncate the SelectedProduct at the same time, since this contains the descriptions which have been overridden. I was hoping that models.DO_NOTHING might help, but it doesn't.
I suppose I either need to temporarily disable the referential integrity while I import the feed (and delete any entries from SelectedProduct which would break the integrity) or I need to represent the relationship differently.
Any advice appreciated please :-)
Note - the above is a simplified representation. I have variants hanging off products too and will have selected variants overriding variant prices.

For the record, I went with the suggestion from #TimNyborg to use update_or_create. The data load now takes roughly 10 times as long, but as it's a batch import of a small-ish dataset the performance isn't an issue and it feels like a better solution overall.

Related

Django - Generic One-to-One Relation

I have a website with different kinds of activities:
Lessons ;
Exercises ;
Quizzes.
So far, each type of activity corresponds to a specific Model and a table in the database. I would like to create an ordered path through these activities. For instance:
Lesson 1
then Exercise 1
then Lesson 2
then Quizz 1
etc.
I am considering to create a model named Activity that would store the following data:
A number: the number of the activity in the path ;
A One-To-One relationship to one given activity (lesson, exercise, quizz etc.).
(1) I have seen that Django offers a GenericForeignKey to handle many-to-one relationship to different kinds of models, like many comments associated to a single lesson or a single exercise. Is there something similar for Generic OneToOne relationship?
(2) I would like to track the progress of my users. To do so, I am considering having a many-to-many relationship called "done_activities" built in my User model and linked to my Activity model. Do you think this is an efficient way of approaching the problem ?
I'm not sure you would need or want self-referential fields in this case. Consider the following structure as an example. I do not propose this 'in stone' as THE solution, but more to spur your own ideas about the solution you need. Please note that I'm leaving out __str__ methods and such for brevity:
ACTIVITY_TYPE = ['Lesson', 'Exercise', 'Quiz']
class Activity(models.Model):
activity_name = models.CharField(max_length=200, blank=True)
activity_type = models.CharField(max_length=100, choices=ACTIVITY_TYPE, blank=True, db_index=True)
activity_desc = models.CharField(max_length=200, blank=True) #description of the lesson, exercise, or quiz
class Program(models.Model):
program_name = models.CharField(max_length=200, blank=True, db_index=True)
description = models.Charfield(max_length=200, blank=True)
class ProgramActivity(models.Model):
program = models.ForeignKey(Program, on_delete=models.CASCADE, your args etc...)
activity = models.ForeignKey(Activity, on_delete=models.CASCADE, your args here etc...)
path_order = models.PositiveSmallIntegerField() # stores a number for the order in the program path
class UserProgram(models.Model):
student = models.ForeignKey(User, etc...) # FK to the user
program = models.ForeignKey(Program, etc...) # connects users to programs
progress = models.DecimalField(max_digits=5, decimal_places=2) # stores percentage of program completed (for example)
In this schema scenario, the following are true:
Activitys are all tracked and stored together on a single table,
organized by type, and each one can have its own description.
Programs are stored on their own model, and
represent a named object that unites all their constituent activities.
ProgramActivity connects activities to specific programs, and
allows you to set the order in the path for that activity relative to the program, and change it easily if you have to. You can easily query activities = ProgramActivity.objects.filter(program=some_program).order_by('path_order') and get a very usable list of a Program's activities.
Finally the UserProgram model records a User's "enrollments" and
progress in each, in this example, by percentage of the program completed.
This is just one possible approach. You may, for example, want to create an activity type table instead of a list dropdown, which may be a more robust way of managing activities over time.

Django M2M with a large table

I have a typical M2M scenario where promotion activities are related to our retailers. However we have a large number of retailers (over 10k) and therefore I can't use the normal multiple select widget.
What I would aim to do is have an 'activity' instance page with a 'retailer' sub-page which would have a table listing all those retailers currently related to the activity. In addition there would be a 'delete' checkbox next to each retailer so they could be removed from the list if necessary. (Naturally, I would also have another search/results page where users could select which retailers they want to add to the list, but I'm sure I can sort that out myself).
Could someone point me in the right direction regarding modelforms and formset factories as I'm not sure where to go from here. It would seem obvious to directly manipulate the app_activity_associated_retailers table but I don't think I can do this with the existing functions. Is there was a pattern for doing this.
class Activity(models.Model):
budget = models.ForeignKey('Budget')
activity_type = models.ForeignKey('ActivityType')
amount = models.DecimalField(max_digits=8, decimal_places=2)
associated_retailers = models.ManyToManyField('Retailer', related_name='associated_retailers')
class Retailer(models.Model):
name = models.CharField(max_length=50)
address01 = models.CharField(max_length=50)
address02 = models.CharField(max_length=50, blank=True)
postcode = models.CharField(max_length=5)
city = models.CharField(max_length=20)
All ManyToManyFields have a through model, whether you define one yourself or not. In your case, it'll have an id, an activity field and a retailer field. You can access the table with Activity.associated_retailers.through -- one "obvious" way is to just expose it as a "model" like
ActivityRetailer = Activity.associated_retailers.through
You can now manipulate these relationships like they were any ol' Django model, so you can i.e. generate querysets like
retailer_records_for_activity = ActivityRetailer.objects.filter(activity_id=1234)
... and you can also create model formsets (complete with that delete checkbox if so configured) for these pseudo-models.

Django many to many recursive relationship

I'm not so great with databases so sorry if I don't describe this very well...
I have an existing Oracle database which describes an algorithim catalogue.
There are two tables algorithims and xref_alg.
Algorithims can have parents and children algorithms. Alg_Xref contains these relationships with two foreign keys - xref_alg and xref_parent.
These are the Django models I have so far from the inspectdb command
class Algorithms(models.Model):
alg_id = models.AutoField(primary_key=True)
alg_name = models.CharField(max_length=100, blank=True)
alg_description = models.CharField(max_length=1000, blank=True)
alg_tags = models.CharField(max_length=100, blank=True)
alg_status = models.CharField(max_length=1, blank=True)
...
class Meta:
db_table = u'algorithms'
class AlgXref(models.Model):
xref_alg = models.ForeignKey(Algorithms, related_name='algxref_alg' ,null=True, blank=True)
xref_parent = models.ForeignKey(Algorithms, related_name='algxref_parent', null=True, blank=True)
class Meta:
db_table = u'alg_xref'
On trying to query AlgXref I encounter this:
DatabaseError: ORA-00904: "ALG_XREF"."ID": invalid identifier
So the error seems to be that it looks for a primary key ID which isn't in the table.. I could create one but seems a bit pointless. Is there anyway to get around this? Or change my models?
EDIT: So after a bit of searching it seems that Django requires a model to have a primary key. Life is too short so have just added a primary key. Will this have any impact on performance?
This is currently a limitation of the ORM provided by Django. Each model has to have one field marked as primary_key=True, if there isn't one, the framework automatically creates an AutoField with name id.
However, this is being worked on as we speak as part of this year's Google Summer of Code and hopefully will be in Django by the end of this year. For now you can try to use the fork of Django available at https://github.com/koniiiik/django which contains an implementation (which is not yet complete but should be sufficient for your purposes).
As for whether there is any benefit or not, that depends. It certainly makes the database more reusable and causes less headaches if you just add an auto incrementing id column to each table. The performance impact shouldn't be too high, the only thing you might notice is that if you have a many-to-many table like this, containing only two ForeignKey columns, adding a third one will increase its size by one half. That should, however, be irrelevant as long as you don't store billions of rows in that table.

Is it ok to use a dictionary on the database for an app with lots of records?

I'm am working on an app which will handle lots of information and am looking for the best way of creating my models. Since I have never worked with apps that deal with so many records, database optimization is not a topic I know lots of, but it seems to me that a good design is a good place to start.
Right now, I have a table for customers, a table for products and a table for product-customer (since we assign a code for each product a customer buys). Since I want to track the balances, there is also a balance table. My models look like this at the moment:
class Customer(models.Model):
first_name = models.CharField(max_length=35)
last_name = models.CharField(max_length=35)
customer_ID= models.IntegerField(primary_key=True)
phone = models.CharField(max_length=10, blank=True, null=True)
class Product(models.Model):
product_ID = models.IntegerField(primary_key=True)
product_code = models.CharField(max_length=25)
invoice_date = models.DateField()
employee = models.ForeignKey(Employee, null=True, blank=True)
product_active = models.BooleanField()
class ProductCustomer(models.Model):
prod = models.ForeignKey(Product, db_index=True)
cust = models.ForeignKey(Customer, db_index=True)
product_customer_ID = models.IntegerField(primary_key=True)
[...]
class Balance(models.Model):
product_customer = models.ForeignKey(ProductCustomer, db_index=True)
balance = models.DecimalField(max_digits=10, decimal_places=2)
batch = models.ForeignKey(Batch)
[...]
The app will return the 'history' of the customer. If the pax was overdue at some point and then he paid and then was due for a refund, etc.
I was thinking if I should insert a CharField on the Pax table which would hold a dictionary with date:status (the status could be calculated and added to the dictionary when I upload the information) or if it is more efficient to do a query on the Balance table, or if there is a better solution to be implemented.
Since there are thousands of products and even more customers, we are talking about around 400K records for the balances on a weekly basis... I am concerned about what can be done to ensure the app runs smoothly.
If I understand your question you seem to be asking about whether the join conditions will impose an unreasonable burden on your lookup query. To some extent this depends on your rdbms. My recommendation is that you go with PostgreSQL over MySQL because MySQL's innodb tables are heavily optimized for primary key lookups and this means two btrees have to be traversed in order to find the records on a join. PostgreSQL on the other hand allows for physical scans of tables meaning foreign key lookups are a bit faster usually.
In general yes, the dictionary approach is fine for an app with lots of records. The questions typically come out of how you are querying and how many records you are pulling in a given query. That is a much larger factor than how many records are stored, at least for a db like PostgreSQL.

Showing partial tabular inline models in Django

Let's see if I can explain myself, I have this models:
class BillHeader(models.Model):
number = models.CharField(_('Bill number'), max_length=10, unique=True, \
default=__number)
client = models.ForeignKey(ClienteYProveedor, verbose_name=_('Client'))
date = models.DateTimeField(_('Date'), default=datetime.now)
def __unicode__(self):
return str(self.number)
class Meta:
abstract = True
class BillFooter(models.Model):
base_import = models.DecimalField(_('Base import'), max_digits=12, \
decimal_places=2)
class Meta:
abstract = True
class BillBody(models.Model):
description = models.CharField(_('Description'), max_length=200)
amount = models.DecimalField(_('Amount'), max_digits=6, decimal_places=2)
discount = models.DecimalField(_('Discount'), max_digits=4, \
decimal_places=2)
price = models.DecimalField(_('Price'), max_digits=12, decimal_places=2)
unitaryprice = models.DecimalField(_('Unitary Price'), max_digits=12, \
decimal_places=2)
def __unicode__(self):
return self.description
class Meta:
abstract = True
class EmittedBill(BillHeader, BillBody, BillFooter):
pass
class ReceivedBill(BillHeader, BillBody, BillFooter):
pass
When the user adds an Emmited or Received bill I need to show BillHeader as a normal fieldset, but BillBody and BillFooter need to be TabularInline.
If I put those as TabularInline in admin.py, an error rises saying that they need a ForeignKey to the related models. Of course, I can't put those foreign keys, because they are declared at the bottom. I think you guys call this "backwards foreign keys".
My question is this: how can I do this to show TabularInlines in the admin without making a mess?. I can do it without abstract base classes, but then another problem comes, it shows the other ForeignKey in the TabularInline (if you are on EmmitedBills it shows the FK to ReceivedBills in the TabularInline and viceversa) and I couldn't been able to exclude them.
Sorry if this is a stupid question, but I'm not a programmer (it's just a hobby) and I'm really making myself a mess with data models.
I'll explain better:
I have two types of bills, Emitted and Received and both of them show on the admin home (that's why I didn't use a BooleanField to mark them). Both types have the same fields except one, the bill number, which in Emmitted will be autogenerated. Each bill consists on 1 header with number, client and date, 1 or more body inline entries with a description, amount, price, etc. and 1 inline footer, showing the total price without taxes, applied taxes, etc.
Update
I have done everything, but I have a problem, in my new model BillBody has two FK's (EmmitedBill and ReceivedBill) and they show up in the TabularInline. How can I hide them?field.exclude() gives an error.
I don't fully understand your question, but you can use
ForeignKey('ModelName')
instead of
ForeignKey(ModelName)
if the ModelName model isn't already declared. Maybe this solves your problem.
Inline admins (like TabularInline) are only used when you have a one-to-many relation, which is created by a ForeignKey on the many side. If you don't have such a foreign key, then you cannot use a inline admin. Inheritance is definitely different from a ForeignKey.
However, I think your data model is wrong. It seems like you do want to store bills. There are two types of bills, emitted and received bills. Both emitted and received bills do have the same fields. Furthermore, you want that each bill consists of a header with number, client and date, 1 or more body entries, where each entry stores the information you store in BillBody and 1 or more decimals base_number.
A probably better data model for you
class Bill(models.Model):
number = models.CharField(_('Bill number'), max_length=10, unique=True, default=__number)
client = models.ForeignKey(ClienteYProveedor, verbose_name=_('Client'))
date = models.DateTimeField(_('Date'), default=datetime.now)
def get_total_price(self):
return sum([entry.price for entry in self.entries])
class BillEntry(models.Model):
bill = models.ForeignKey(Bill, related_name='entries')
description = models.CharField(_('Description'), max_length=200)
amount = models.DecimalField(_('Amount'), max_digits=6, decimal_places=2)
discount = models.DecimalField(_('Discount'), max_digits=4, decimal_places=2)
price = models.DecimalField(_('Price'), max_digits=12, decimal_places=2)
unitaryprice = models.DecimalField(_('Unitary Price'), max_digits=12, decimal_places=2)
I have left out the __unicode__ methods.
Now you have a foreign key from BillEntry to Bill and you can use a tabular inline. I didn't understand your usage of base_import so I left this out.
Price computation
If your price should always equal something like amount*unitaryprice - discount or amount*(unitaryprice-discount) then you shouldn't put this in a field but either compute it when it is needed, either in Python or in the database. If you want to do this in Python you can use a method similar to get_total_price. If you want to compute them when querying the database then it is a little bit more difficult to get it working with Django.
In the last case, you can have a look at SQL views, but I think this is a little bit too difficult for a beginner. Another option is to use a custom SQL expression:
BillEntry.objects.extra(select={'price': 'amount*unitaryprice-discount'})
This will compute the price for all entries during selection.
Update
If you add two subclasses for emitted and received bills and use multi table inheritance then you can use one foreign key from BillEntry to Bill.
class EmittedBill(Bill):
pass
class ReceivedBill(Bill):
pass
You probably also have to think about the database model generated by Django. Usually, you only want to store elementary data in the database, and not computed data (like you want to do in your footer). So, if prices are computed using some formula and using the unitaryprice, amount etc. you shouldn't store the result of this formula but recompute it when necessary (and eventually cache to avoid re-computations). If you don't do this, chances are that you at some moment update something (for example the amount) and forget to update the computed values (the price) which leads to inconsistencies in your database (and thus bugs in your application). A good database does have constraints so that it is impossible to store inconsistent database without breaking at least one constraint.
I also don't see why you want a separate header and footer per bill. A model isn't the real bill, it stores the information for a bill. If you want to have a visible header and footer, then you should do this in your view layer (the template) and not in the model itself.