Django select rows with duplicate field values for specific foreign key - django

again I would like to search for duplicates in my models, but now slightly different case.
Here are my models:
class Concept(models.Model):
main_name = models.ForeignKey(Literal)
...
class Literal(models.Model):
name = models.Charfield(...)
concept = models.ForeignKey(Concept)
...
And now the task I'm trying to achieve:
Select all literals that are NOT main_names, that have the same name for the same concept.
For example if I have literals:
[{id:1, name:'test', concept:1}, {id:2, name:'test', concept:1}]
and concepts:
[{id:1, main_name:1}]
Then in result I should get literal with the ID=2.

It sounds to me as though you want to execute a SQL query something like this:
SELECT l1.* FROM myapp_literal AS l1,
myapp_literal AS l2
WHERE l1.id <> l2.id
AND l1.name = l2.name
AND l1.concept = l2.concept
AND l1.id NOT IN (SELECT main_name FROM myapp_concept)
GROUP BY l1.id
Well, in cases where the query is too complex to easily express in Django's query language, you can always ask Django to do a raw SQL query—and this may be one of those cases.

If I understand your question you want:
All Literal objects that are not ForeignKey'd to Concept.
From that set, select those where the name and the concept is the same.
If so, I think this should work:
For the first part:
q = Literal.objects.exclude(pk__in=Concept.objects.values_list('id', flat=True))
EDIT:
Based on excellent feedback from Jan, I think for #2 you would need to use raw SQL.

Related

Converting SQL to something that Django can use

I am working on converting some relatively complex SQL into something that Django can play with. I am trying not to just use the raw SQL, since I think playing with the standard Django toolkit will help me learn more about Django.
I have already managed to break up parts of the sql into chunks, and am tackling them piecemeal to make things a little easier.
Here is the SQL in question:
SELECT i.year, i.brand, i.desc, i.colour, i.size, i.mpn, i.url,
COALESCE(DATE_FORMAT(i_eta.eta, '%M %Y'),'Unknown')
as eta
FROM i
JOIN i_eta ON i_eta.mpn = i.mpn
WHERE category LIKE 'kids'
ORDER BY i.brand, i.desc, i.colour, FIELD(size, 'xxl','xl','l','ml','m','s','xs','xxs') DESC, size+0, size
Here is what I have (trying to convert line by line):
(grabbed automatically when performing filters)
(have to figure out django doc on coalesce for syntax)
db alias haven't been able to find yet - it is crucial since there is a db view that requires it
already included in the original q
.select_related?
.filter(category="kids")
.objects.order_by('brand','desc','colour') - don't know how to deal with SQL FIELDS
Any advice would be appreciated!
Here's how I would structure this.
First, I'm assuming your models for i and i_eta look something like this:
class I(models.Model):
mpn = models.CharField(max_length=30, primary_key=True)
year = models.CharField(max_length=30)
brand = models.CharField(max_length=30)
desc = models.CharField(max_length=100)
colour = models.CharField(max_length=30)
size = models.CharField(max_length=3)
class IEta(models.Model):
i = models.ForeignKey(I, on_delete=models.CASCADE)
eta = models.DateField()
General thoughts:
To write the coalesce in Django: I would not replace nulls with "Unknown" in the ORM. This is a presentation-layer concern: it should be dealt with in a template.
For date formatting, you can do date formatting in Python.
Not sure what a DB alias is.
For using multiple tables together, you can use either select_related(), prefetch_related(), or do nothing.
select_related() will perform a join.
prefect_related() will get the foreign key ID's from the first queryset, then generate a query like SELECT * FROM table WHERE id in (12, 13, 14).
Doing nothing will work automatically, but has the disadvantage of the SELECT N+1 problem.
I generally prefer prefetch_related().
For customizing the sort order of the size field, you have three options. My preference would be option 1, but any of the three will work.
Denormalize the sort criteria. Add a new field called size_numeric. Override the save() method to populate this field when saving new instances, giving xxl the value 1, xl the value 2, etc.
Sort in Python. Essentially, you use Python's built-in sorting methods to do the sort, rather than sorting it in the database. This doesn't work well if you have thousands of results.
Invoke the MySQL function. Essentially, using annotate(), you add the output of a function to the queryset. order_by() can sort by that function.

django orm equivalent for "group by sth. having every(sth.)"

Suppose I have a model "Book" with three fields: name, author_name (as a string), publish_state.
Now I want to find out which authors have exactly one published book. This sql query does exactly what I want:
SELECT author_name from books GROUP BY author_name HAVING count(name) = 1 and every(publish_state = 'published');
I'm using Postgres as you can say by the 'every' clause.
Is there a way to do this in django ORM? or do I need to use a raw query for that?
There is probably a better way, but at least you can do the following ...
books = Book.objects.filter(publish_state='published')
authors = {}
for book in books:
authors.setdefault(book.author, 0)
authors[book.author] += 1
rookies = [x for x, y in authors.iteritems() if y == 1]
Django uses the having clause when you use filter() after an annotate like annotate(foo=Count('bar')).filter(foo__gt=3). Add in values_list and distinct like so and you should be good:
Book.objects.filter(publish_state='published').values_list('author_name', flat=True).order_by('author_name').annotate(count_books=Count('name')).filter(count_books=1).distinct()
Unfortunately this feature is not implemented in django's ORM, and I don't think it would be any time soon, as it is not a very common query.
I had to use raw SQL. You can find about it here:
https://docs.djangoproject.com/en/1.7/topics/db/sql/

Select DISTINCT individual columns in django?

I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.

Django query to select parent with nonzero children

I have model with a Foreign Key to itself like this:
class Concept(models.Model):
name = models.CharField(max_length=200)
category = models.ForeignKey('self')
But I can't figure out how I can select all concepts that have nonzero children value. Is this possible with django QuerySet API or I must write custom SQL?
If I understand it correctly, each Concept may have another Concept as parent, and this is set into the category field.
In other words, a Concept with at least a child will be referenced at least once in the category field.
Generally speaking, this is not really easy to get in Django; however if you do not have too many categories, you can think for a query of the like of SELECT * FROM CONCEPTS WHERE CONCEPTS.ID IN (SELECT CATEGORY FROM CONCEPTS); - and this is something you can map easily with Django:
Concept.objects.filter(pk__in=Concept.objects.all().values('category'))
Note that, as stated on Django documentation, this query may have performance issues on certain databases; therefore you should instead put it as a list:
Concept.objects.filter(id__in=list(Concept.objects.all().values('category')))
But please be aware that this could hit some database limitation -- for instance, Oracle allows up to 1000 elements in such lists.
How about something like this:
concepts = Concept.objects.exclude(category=None)
The way you have it written there will require a value for category. Once you have fixed that (with null=True in the field constructor), use this:
Concept.objects.filter(category__isnull=False)

Django DB, finding Categories whose Items are all in a subset

I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.
Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.
lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module
I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])