I have the below structure, where content modules, which are subclassed from a common model, are attached to pages via a 'page module' model that references them via a GenericForeignKey:
class SitePage(models.Model):
title = models.CharField()
# [etc..]
class PageModule(models.Model):
page = models.ForeignKey(SitePage, db_index=True, on_delete=models.CASCADE)
module_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
module_id = models.PositiveIntegerField()
module_object = GenericForeignKey("module_type", "module_id")
class CommonModule(models.Model):
published_time = models.DateTimeField()
class SingleImage(CommonModule):
title = models.CharField()
# [etc..]
class Article(CommonModule):
title = models.CharField()
# [etc..]
At the moment, populating pages from this results in a LOT of SQL queries. I want to fetch all the module contents (i.e. all the SingleImage and Article instances) for a given page in the most database-efficient manner.
I can't just do a straight prefetch_related because it "must be restricted to a homogeneous set of results", and I'm fetching multiple content types.
I can get each module type individually:
image_modules = PageModule.objects.filter(page=whatever_page, module_type=ContentType.objects.get_for_model(SingleImage)).prefetch_related('module_object_')
article_modules = PageModule.objects.filter(page=whatever_page, module_type=ContentType.objects.get_for_model(Article)).prefetch_related('module_object')
all_modules = image_modules | article_modules
But I need to sort them:
all_modules.order_by('module_object__published_time')
and I can't because:
"Field 'module_object' does not generate an automatic reverse relation
and therefore cannot be used for reverse querying"
... and I don't think I can add the recommended GenericRelation field to all the content models because there's already content in there.
So... can I do this at all? Or am I stuck?
Following the advice in the comments above I eventually arrived at this code (from 2012!) that has roughly halved the number of queries:
https://gist.github.com/justinfx/3095246
However, as I noted above, it's done that at the expense of creating some fairly inefficient WHERE pk IN() queries, so I've not actually saved much time in total.
Related
I have 2 models with a one-to-many relation on a MySQL DB:
class Domains(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=50, unique=True)
description = models.TextField(blank=True, null=True)
class Kpis(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=50, unique=True)
description = models.TextField(blank=True, null=True)
domain_id = models.ForeignKey(Domains, on_delete=models.CASCADE, db_column='domain_id')
In order to bring ALL the domains with all their kpis objects, i use this code with a for loop:
final_list = []
domains_list = Domains.objects.all()
for domain in domains_list:
# For each domain, get all related KPIs
domain_kpis = domain.kpis_set.values()
final_list.append({domain:domains_kpis})
The total number of queries i run is: 1 + the number of total domains i have, which is quite a lot.
I'm looking for a way to optimize this, preferably to execute it within only one query on the database. Is this possible?
You use .prefetch_related(…) [Django-doc] for this:
final_list = []
domains_list = Domains.objects.prefetch_related('kpis_set')
for domain in domains_list:
# For each domain, get all related KPIs
domain_kpis = domain.kpis_set.all()
final_list.append({domain:domains_kpis})
This will make two queries: one to query the domains, and a second to query all the related Kpis with a single query into memory.
Furthermore please do not use .values(). You can serialze data to JSON with Django's serializer framework, by making use of .values() you "erode" the model layer. See the Serializing Django objects section of the documentation for more information.
Just wanted to add that you are asking a solution for "classic" N +1 queries problem. Here you can read a something about it and aslo find the examples for prefetch_related method adviced in Willem's answer.
Another thing worth mentioning is that probably you aren't suppose to use this dict final_list.append({domain:domains_kpis}), but instead you may want to map some field(s) from Domain to some field(s) from Kapis models and, if this is true, you can specify exact fields you'd like to have prefetched using Prefetch:
domains_list = Domains.objects.prefetch_related(Prefetch('kpis_set'), queryset=Kapis.objects.all().only('some_field_you_want_to_have'))
final_list = []
for domain in domains_list:
domain_kpis = domain.kpis_set.all()
final_list.append({domain.some_field:domains_kpis.prefetched_field})
This should give another boost to performance on big-volume table's.
So I'm trying to put together a webpage and I am currently have trouble putting together a results page for each user in the web application I am putting together.
Here are what my models look like:
class Fault(models.Model):
name = models.CharField(max_length=255)
severity = models.PositiveSmallIntegerField(default=0)
description = models.CharField(max_length=1024, null=False, blank=False)
recommendation = models.CharField(max_length=1024, null=False, blank=False)
date_added = models.DateTimeField(_('date added'), default=timezone.now)
...
class FaultInstance(models.Model):
auto = models.ForeignKey(Auto)
fault = models.ForeignKey(Fault)
date_added = models.DateTimeField(_('date added'), default=timezone.now)
objects = FaultInstanceManager()
...
class Auto(models.Model):
label = models.CharField(max_length=255)
model = models.CharField(max_length=255)
make = models.CharField(max_length=255)
year = models.IntegerField(max_length=4)
user = models.ForeignKey(AUTH_USER_MODEL)
...
I don't know if my model relationships are ideal, however it made sense it my head. So each user can have multiple Auto objects associated to them. And each Auto can have multiple FaultInstance objects associated to it.
In the results page, I want to list out the all the FaultInstances that a user has across their Autos. And under each listed FaultInstance I will have a list of all the autos that the user owns that has the fault, with its information (here is kind of what I had in mind).
All FaultInstance Listing Ordered by Severity (large number to low number)
FaultInstance:
FaultDescription:
FaultRecommendation:
ListofAutosWithFault:
AutoLabel AutoModel AutoYear ...
AutoLabel AutoModel AutoYear ...
Obviously, do things the correct way would mean that I want to do as much of the list creation in the Python/Django side of things and avoid doing any logic or processing in the template. I am able to create a list per severity with the a model manager as seen here:
class FaultInstanceManager(models.Manager):
def get_faults_by_user_severity(self, user, severity):
faults = defaultdict(list)
qs_faultinst = self.model.objects.select_related().filter(
auto__user=user, fault__severity=severity
).order_by('auto__make')
for result in qs_faultinst:
faults[result.fault].append(result)
faults.default_factory = None
return faults
I still need to specify each severity but I guess if I only have 5 severity levels, I can create a list for each severity level and pass each individual one to template. Any suggestions for this is appreciated. However, thats not my problem. My stopping point right now is that I want to create a summary table at the top of their report which can give the user breakdown of fault instances per make|model|year. I can't think of the proper query or data structure to pass on to the template.
Summary (table of all the FaultInstances with the following column headers):
FaultInstance Make|Model|Year NumberOfAutosAffected
This will let me know metrics for a make or a model or a year (in the example below, its separating faults based on model). I'm listing FaultInstances because I'm only listed Faults that a connected to a user.
For Example
Bad Starter Nissan 1
Bad Tailight Honda 2
Bad Tailight Nissan 1
And I am such a perfectionist that I want to do this while optimizing database queries. If I can create a data structure in my original query that will be easily parsed in template and still get both these sections in my report (maybe a defaultdict of a defaultdict(list)), thats what I want to do. Thanks for the help and hopefully my question is thorough and makes sense.
It makes sense to use related names because it simplifies your query. Like this:
class FaultInstance(models.Model):
auto = models.ForeignKey(Auto, related_name='fault_instances')
fault = models.ForeignKey(Fault, related_name='fault_instances')
...
class Auto(models.Model):
user = models.ForeignKey(AUTH_USER_MODEL, related_name='autos')
In this case you can use:
qs_faultinst = user.fault_instances.filter(fault__severity=severity).order_by('auto__make')
instead of:
qs_faultinst = self.model.objects.select_related().filter(
auto__user=user, fault__severity=severity
).order_by('auto__make')
I can't figure out your summary table, may be you meant:
Fault Make|Model|Year NumberOfAutosAffected
In this case you can use aggregation. But It (grouping) would still be slow if you have enough data. The one easy solution is just to denormalize data by creating extra model and create few signals to update it or you can use cache.
If you have a predefined set of severities then think about this:
class Fault(models.Model):
SEVERITY_LOW = 0
SEVERITY_MIDDLE = 1
SEVERITY_HIGH = 2
...
SEVERITY_CHOICES = (
(SEVERITY_LOW, 'Low'),
(SEVERITY_MIDDLE, 'Middle'),
(SEVERITY_HIGH, 'High'),
...
)
...
severity = models.PositiveSmallIntegerField(default=SEVERITY_LOW,
choices=SEVERITY_CHOICES)
...
In your templates you can just iterate through Fault.SEVERITY_CHOICES.
Update:
Change your models:
Аllocate model into a separate model:
class AutoModel(models.Model):
name = models.CharField(max_length=255)
Change the field model of model Auto :
class Auto(models.Model):
...
auto_model = models.ForeignKey(AutoModel, related_name='cars')
...
Add a model:
class MyDenormalizedModelForReport(models.Model):
fault = models.ForeignKey(Fault, related_name='reports')
auto_model = models.ForeignKey(AutoModel, related_name='reports')
year = models.IntegerField(max_length=4)
number_of_auto_affected = models.IntegerField(default=0)
Add a signal:
def update_denormalized_model(sender, instance, created, **kwargs):
if created:
rep, dummy_created = MyDenormalizedModelForReport.objects.get_or_create(fault=instance.fault, auto_model=instance.auto.auto_model, year=instance.auto.year)
rep.number_of_auto_affected += 1
rep.save()
post_save.connect(update_denormalized_model, sender=FaultInstance)
I have a situation where I'm trying to create a quick and easy admin interface for composers to list the instruments in a piece of music. What I am looking for is a single entity, an Instrumentation, which defines a particular combination of instruments. For example, a saxophone quartet might consist of:
Soprano sax
Alto sax
Tenor sax
Baritone sax
but it also might consist of two altos, tenor and bari instread. The problem gets worse when you try to add an entire section (like 1st violins--as many as 18 members).
The initial model I came up with looks like this:
class Work(Post):
authors = models.ManyToManyField(Individual)
title = models.CharField(max_length=255)
subtitle = models.CharField(max_length=255, blank=True)
program_notes = models.TextField(blank=True)
notes = models.TextField(blank=True)
media = models.ManyToManyField('Upload')
class Composition(Work):
instrumentation = models.ForeignKey('Instrumentation')
class Instrumentation(models.Model):
forces = models.ManyToManyField(Instrument)
types = models.ManyToManyField('InstrumentationType')
class InstrumentationType(models.Model):
type = models.CharField(max_length=255)
variation = models.SmallIntegerField(default=0)
created = models.DateTimeField(auto_now_add=True)
modified = models.DateTimeField(auto_now=True)
I plan to later map each instrument in the piece to a performer in a rehearsal, concert, etc., so it's more than a simple count that I need. If I were doing this without django (i.e. just SQL and database design), I would have a mapping table with
Instrumentation :
id (int serial PK),
type (FK),
composition_id (FK),
instrument_id (FK)
It looks like Django is creating this exact situation for me in the database, but for some reason the framework needs type, composition_id and instrument_id to be unique together. The admin interface (multiselect box) also makes it clear that having multiple similar entries isn't how the many to many field was designed to work. So how do I achieve this? Is there an established workaround for this?
The chosen answer to this question solves it. I needed to explicitly define the mapping table and then use the admin inline feature to fix the interface.
models.py:
class Instrumentation(models.Model):
forces = models.ManyToManyField(Instrument, through='InstrumentationForces')
types = models.ManyToManyField('InstrumentationType')
class InstrumentationForces(models.Model):
instrument = models.ForeignKey(Instrument)
instrumentation = models.ForeignKey(Instrumentation)
admin.py:
class InstrumentInline(admin.TabularInline):
model = InstrumentationForces
extra = 3
class InstrumentationAdmin(admin.ModelAdmin):
filter_horizontal = ('types',)
inlines = (InstrumentInline,)
admin.site.register(Instrumentation, InstrumentationAdmin)
I'm trying to store sections of a document in a Django app. The model looks like:
class Section(models.Model):
project = models.ForeignKey(Project)
parent_section = models.ForeignKey('Section', blank=True, null=True, related_name='child_set')
predecessor_section = models.ForeignKey('Section', blank=True, null=True, related_name='predecessor_set')
name = models.CharField(max_length=100)
text = models.TextField(blank=True, null=True)
I create a whole lot of sections, link them (parent_section, predecessor_section) and store them by calling each of their save methods. However, when I look into the table after saving it, the parent_section_id and the predecessor_section_id are not set, even though I had objects attached to them before saving.
I assume it has to do with the fact that some parent_section instances don't have an id assigned as their instance hasn't been stored yet, but using manual transactions couldn't solve the problem.
Any thoughts on that?
Cheers,
Max
objects do not have an id until you save them in Django ORM.
So I'd say you need to save() the object, then reference it in your parent/child sections (and re-save the sections).
However, another option to storing prec and next as pointers is to store an sequence_index (spaced by 10 to allow further inserts wiothout reordering) and order by this index.
Try doing a save() on all the objects, then update their relations, and then save() all of them again.
When you assign a foreignkey, the related (target) object's id is copied. since at the moment of assigning the relations (parent_section, predecessor_section) the related objects don't have an id yet, you get a funky result:
A = Section(name='A')
B = Section(name='B')
B.parent_section = A
A.save()
B.save()
B.parent_section # this will say A
B.parent_section_id # this will say **None**
But this should work:
A = Section(name='A')
B = Section(name='B')
A.save()
B.save()
B.parent_section = A
B.parent_section # this will say A
B.parent_section_id # this will say A.id
B.save() # don't forget this one :)
this is a model of the view table.
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
this is the SQL i use to create the table
CREATE VIEW qry_desc_char as
SELECT
tbl_desc.iid_id,
tbl_desc.cid_id,
tbl_desc.cs,
tbl_char.cid,
tbl_char.charname
FROM tbl_desC,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
i dont know if i need a function in models or views or both. i want to get a list of objects from that database to display it. This might be easy but im new at Django and python so i having some problems
Django 1.1 brought in a new feature that you might find useful. You should be able to do something like:
class QryDescChar(models.Model):
iid_id = models.IntegerField()
cid_id = models.IntegerField()
cs = models.CharField(max_length=10)
cid = models.IntegerField()
charname = models.CharField(max_length=50)
class Meta:
db_table = u'qry_desc_char'
managed = False
The documentation for the managed Meta class option is here. A relevant quote:
If False, no database table creation
or deletion operations will be
performed for this model. This is
useful if the model represents an
existing table or a database view that
has been created by some other means.
This is the only difference when
managed is False. All other aspects of
model handling are exactly the same as
normal.
Once that is done, you should be able to use your model normally. To get a list of objects you'd do something like:
qry_desc_char_list = QryDescChar.objects.all()
To actually get the list into your template you might want to look at generic views, specifically the object_list view.
If your RDBMS lets you create writable views and the view you create has the exact structure than the table Django would create I guess that should work directly.
(This is an old question, but is an area that still trips people up and is still highly relevant to anyone using Django with a pre-existing, normalized schema.)
In your SELECT statement you will need to add a numeric "id" because Django expects one, even on an unmanaged model. You can use the row_number() window function to accomplish this if there isn't a guaranteed unique integer value on the row somewhere (and with views this is often the case).
In this case I'm using an ORDER BY clause with the window function, but you can do anything that's valid, and while you're at it you may as well use a clause that's useful to you in some way. Just make sure you do not try to use Django ORM dot references to relations because they look for the "id" column by default, and yours are fake.
Additionally I would consider renaming my output columns to something more meaningful if you're going to use it within an object. With those changes in place the query would look more like (of course, substitute your own terms for the "AS" clauses):
CREATE VIEW qry_desc_char as
SELECT
row_number() OVER (ORDER BY tbl_char.cid) AS id,
tbl_desc.iid_id AS iid_id,
tbl_desc.cid_id AS cid_id,
tbl_desc.cs AS a_better_name,
tbl_char.cid AS something_descriptive,
tbl_char.charname AS name
FROM tbl_desc,tbl_char
WHERE tbl_desc.cid_id = tbl_char.cid;
Once that is done, in Django your model could look like this:
class QryDescChar(models.Model):
iid_id = models.ForeignKey('WhateverIidIs', related_name='+',
db_column='iid_id', on_delete=models.DO_NOTHING)
cid_id = models.ForeignKey('WhateverCidIs', related_name='+',
db_column='cid_id', on_delete=models.DO_NOTHING)
a_better_name = models.CharField(max_length=10)
something_descriptive = models.IntegerField()
name = models.CharField(max_length=50)
class Meta:
managed = False
db_table = 'qry_desc_char'
You don't need the "_id" part on the end of the id column names, because you can declare the column name on the Django model with something more descriptive using the "db_column" argument as I did above (but here I only it to prevent Django from adding another "_id" to the end of cid_id and iid_id -- which added zero semantic value to your code). Also, note the "on_delete" argument. Django does its own thing when it comes to cascading deletes, and on an interesting data model you don't want this -- and when it comes to views you'll just get an error and an aborted transaction. Prior to Django 1.5 you have to patch it to make DO_NOTHING actually mean "do nothing" -- otherwise it will still try to (needlessly) query and collect all related objects before going through its delete cycle, and the query will fail, halting the entire operation.
Incidentally, I wrote an in-depth explanation of how to do this just the other day.
You are trying to fetch records from a view. This is not correct as a view does not map to a model, a table maps to a model.
You should use Django ORM to fetch QryDescChar objects. Please note that Django ORM will fetch them directly from the table. You can consult Django docs for extra() and select_related() methods which will allow you to fetch related data (data you want to get from the other table) in different ways.