Django query with distinct and order_by - django

I have two models
class Employer(models.Model):
name = models.CharField(max_length=300, blank=False)
id = models.IntegerField()
status = models.IntegerField()
eminence = models.IntegerField(null=False,default=4)
class JobTitle(models.Model)
name = models.CharField(max_length=300, blank=False)
employer = models.ForeignKey(Employer,unique=False,null=True)
activatedate = models.DateTimeField(default=datetime.datetime.now)
I need all employers in the order of whose jobtitle activated last.
Employer.objects.filter(status=1).order_by('eminence','-jobtitle__activatedate')
This query gives me what I want but it returns repeated employers if employer has more than one jobtitle.
I would use distinct() but in Django documents I found that
*Any fields used in an order_by() call are included in the SQL SELECT columns. This can sometimes lead to unexpected results when used in conjunction with distinct(). If you order by fields from a related model, those fields will be added to the selected columns and they may make otherwise duplicate rows appear to be distinct. Since the extra columns don't appear in the returned results (they are only there to support ordering), it sometimes looks like non-distinct results are being returned.*
Although they explained my problem no solution is specified.
Could give me a suggestion how can I group by my employer list without corrupting the API stability?

You can place on the Employer class its latest JobTitle activation date, and then order by this field without using relations.[1] The tradeoff here is a bit of data duplication, and the necessity to update employer's latest job title activation date manually when corresponding JobTitle instance changes.
Also, check this post for another solution which uses annotate().
Related questions:
[1] Queryset API distinct() does not work?

Related

How to Get a Distinct Filtered QuerySet In Django Without Using the distinct Method?

Below is my post model.
class Post(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
title = models.CharField(max_length=200)
content = models.TextField()
datetime = models.DateTimeField(auto_now_add=True)
votes = models.ManyToManyField(settings.AUTH_USER_MODEL,
related_name="post_votes", default=None, blank=True)
tags = models.ManyToManyField(Tag, default=None, blank=True)
I want to filter posts which contain a certain query in their title, content or as the name of one of their tags. To do this I've tried:
query_set = Post.objects.filter(Q(content__icontains=query)|
Q(tags__name__icontains=query)|
Q(title__icontains=query))
But this often returns QuerySets with duplicate results. I have tried using the distinct method to solve this, but that results in incorrect ordering when I sort the posts later on by the number of votes they have:
query_set.annotate(vote_count=Count('votes')).order_by('-vote_count', '-datetime')
If anybody could help me I would be very grateful.
Jack
The duplicates originate from the fact that you filter on related objects. This means that Django will perform a query with a JOIN in it. You can of course perform a uniqness filter at the Django/Python level, but those are inefficient (well the ineffeciency is two-fold: first it will result in more data being transmitted from the database to the Django server, and furthermore Python does not handle large collections very well).
Furthermore the line:
query_set.annotate(vote_count=Count('votes')).order_by('-vote_count', '-datetime')
is basically a no-op, since QuerySets are immutable, here you did not sort the QuerySet on votes, you constructed a new one that will do that, but you immediately throw it away, since you do nothing with the result.
You can add the annotation and ordering and thus obtain distinct results later on:
query_set = Post.objects.filter(
Q(content__icontains=query)|
Q(tags__name__icontains=query)|
Q(title__icontains=query)
).annotate(
vote_count=Count('votes', distinct=True)
).order_by('-vote_count', '-date_time').distinct()
The distinct=True on the Count is necessary, since, as said before, the query acts like a JOIN, and JOINs can act like "multipliers" when counting things, since a row can occur multiple times.

Django M2M with a large table

I have a typical M2M scenario where promotion activities are related to our retailers. However we have a large number of retailers (over 10k) and therefore I can't use the normal multiple select widget.
What I would aim to do is have an 'activity' instance page with a 'retailer' sub-page which would have a table listing all those retailers currently related to the activity. In addition there would be a 'delete' checkbox next to each retailer so they could be removed from the list if necessary. (Naturally, I would also have another search/results page where users could select which retailers they want to add to the list, but I'm sure I can sort that out myself).
Could someone point me in the right direction regarding modelforms and formset factories as I'm not sure where to go from here. It would seem obvious to directly manipulate the app_activity_associated_retailers table but I don't think I can do this with the existing functions. Is there was a pattern for doing this.
class Activity(models.Model):
budget = models.ForeignKey('Budget')
activity_type = models.ForeignKey('ActivityType')
amount = models.DecimalField(max_digits=8, decimal_places=2)
associated_retailers = models.ManyToManyField('Retailer', related_name='associated_retailers')
class Retailer(models.Model):
name = models.CharField(max_length=50)
address01 = models.CharField(max_length=50)
address02 = models.CharField(max_length=50, blank=True)
postcode = models.CharField(max_length=5)
city = models.CharField(max_length=20)
All ManyToManyFields have a through model, whether you define one yourself or not. In your case, it'll have an id, an activity field and a retailer field. You can access the table with Activity.associated_retailers.through -- one "obvious" way is to just expose it as a "model" like
ActivityRetailer = Activity.associated_retailers.through
You can now manipulate these relationships like they were any ol' Django model, so you can i.e. generate querysets like
retailer_records_for_activity = ActivityRetailer.objects.filter(activity_id=1234)
... and you can also create model formsets (complete with that delete checkbox if so configured) for these pseudo-models.

select_related with reverse foreign keys

I have two Models in Django. The first has the hierarchy of what job functions (positions) report to which other positions, and the second is people and what job function they hold.
class PositionHierarchy(model.Model):
pcn = models.CharField(max_length=50)
title = models.CharField(max_length=100)
level = models.CharField(max_length=25)
report_to = models.ForeignKey('PositionHierachy', null=True)
class Person(model.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
...
position = models.ForeignKey(PositionHierarchy)
When I have a Person record and I want to find the person's manager, I have to do
manager = person.position.report_to.person_set.all()[0]
# Can't use .first() because we haven't upgraded to 1.6 yet
If I'm getting people with a QuerySet, I can join (and avoid a second trip to the database) with position and report_to using Person.objects.select_related('position', 'position__reports_to').filter(...), but is there any way to avoid making another trip to the database to get the person_set? I tried adding 'position__reports_to__person_set' or just position__reports_to__person to the select_related, but that doesn't seem to change the query. Is this what prefetch_related is for?
I'd like to make a custom manager so that when I do a query to get Person records, I also get their PositionHeirarchy and their manager's Person record without more round trips to the database. This is what I have so far:
class PersonWithManagerManager(models.Manager):
def get_query_set(self):
qs = super(PersonWithManagerManager, self).get_query_set()
return qs.select_related(
'position',
'position__reports_to',
).prefetch_related(
)
Yes, that is what prefetch_related() is for. It will require an additional query, but the idea is that it will get all of the related information at once, instead of once per Person.
In your case:
qs.select_related('position__report_to')
.prefetch_related('position__report_to__person_set')
should require two queries, regardless of the number of Persons in the original query set.
Compare this example from the documentation:
>>> Restaurant.objects.select_related('best_pizza')
.prefetch_related('best_pizza__toppings')

How QuerySets are evaluated in Django?

I have following models:
class Product(models.Model):
"""
Basic product
"""
name = models.CharField(max_length=100, db_column='name', unique=True)
url = models.SlugField(max_length=100, db_column="url", unique=True, db_index=True)
description = HTMLField(db_column='description')
category = models.ForeignKey(Category, db_column='category', related_name='products')
class FirstObject(Product):
pass
class FirstProduct(models.Model):
product = models.ForeignKey(FirstObject, db_column='product')
color = models.ForeignKey(Color, db_index=True, db_column='color')
class SecondObject(Product):
pass
class SecondProduct(models.Model):
product = models.ForeignKey(SecondObject, db_column='product')
diameter = models.PositiveSmallIntegerField(db_column='diameter')
In other words I have two different types of products (with different parameters).
I want for particular category (in category can be only one type of product and I know what) select all products with appropriate parameters.
How can this be accomplished efficiently?
If I write Category.objects.get(id=id).products.all() and then use related manager to fetch parameters of particular product, does it mean that database is hitted for every product?
Second approach is to fetch all products in one query and then fetch all parameters.
Then group them in list/dictionary.
What approach is the best? Or maybe there is another approach?
Thank you.
Your schema really does not lend itself well to querying. You will very quickly hit the worst case query behaviour (2 queries for every type of modification to a product. I suggest you have a look at the schema for django-shop-simplevariations and see how they are able to achieve fast lookups (Hint, the schema is structured for prefetch_related to be effective).

Django many to many recursive relationship

I'm not so great with databases so sorry if I don't describe this very well...
I have an existing Oracle database which describes an algorithim catalogue.
There are two tables algorithims and xref_alg.
Algorithims can have parents and children algorithms. Alg_Xref contains these relationships with two foreign keys - xref_alg and xref_parent.
These are the Django models I have so far from the inspectdb command
class Algorithms(models.Model):
alg_id = models.AutoField(primary_key=True)
alg_name = models.CharField(max_length=100, blank=True)
alg_description = models.CharField(max_length=1000, blank=True)
alg_tags = models.CharField(max_length=100, blank=True)
alg_status = models.CharField(max_length=1, blank=True)
...
class Meta:
db_table = u'algorithms'
class AlgXref(models.Model):
xref_alg = models.ForeignKey(Algorithms, related_name='algxref_alg' ,null=True, blank=True)
xref_parent = models.ForeignKey(Algorithms, related_name='algxref_parent', null=True, blank=True)
class Meta:
db_table = u'alg_xref'
On trying to query AlgXref I encounter this:
DatabaseError: ORA-00904: "ALG_XREF"."ID": invalid identifier
So the error seems to be that it looks for a primary key ID which isn't in the table.. I could create one but seems a bit pointless. Is there anyway to get around this? Or change my models?
EDIT: So after a bit of searching it seems that Django requires a model to have a primary key. Life is too short so have just added a primary key. Will this have any impact on performance?
This is currently a limitation of the ORM provided by Django. Each model has to have one field marked as primary_key=True, if there isn't one, the framework automatically creates an AutoField with name id.
However, this is being worked on as we speak as part of this year's Google Summer of Code and hopefully will be in Django by the end of this year. For now you can try to use the fork of Django available at https://github.com/koniiiik/django which contains an implementation (which is not yet complete but should be sufficient for your purposes).
As for whether there is any benefit or not, that depends. It certainly makes the database more reusable and causes less headaches if you just add an auto incrementing id column to each table. The performance impact shouldn't be too high, the only thing you might notice is that if you have a many-to-many table like this, containing only two ForeignKey columns, adding a third one will increase its size by one half. That should, however, be irrelevant as long as you don't store billions of rows in that table.