Django: union of different queryset on the same model

Django: union of different queryset on the same model - django

I'm programming a search on a model and I have a problem.
My model is almost like:
class Serials(models.Model):
id = models.AutoField(primary_key=True)
code = models.CharField("Code", max_length=50)
name = models.CharField("Name", max_length=2000)
and I have in the database tuples like these:
1 BOSTON The new Boston
2 NYT New York journal
3 NEWTON The old journal of Mass
4 ANEWVIEW The view of the young people
If I search for the string new, what I want to have is:
first the names that start with the string
then the codes that start with the string
then the names that contain the string
then the codes that contain the string
So the previous list should appear in the following way:
2 NYT New York journal
3 NEWTON The old journal of Mass
1 BOSTON The new Boston
4 ANEWVIEW The view of the young people
The only way I found to have this kind of result is to make different searches (if I put "OR" in a single search, I loose the order I want).
My problem is that the code of the template that shows the result is really redundant and honestly very ugly, because I have to repeat the same code for all the 4 different querysets. And the worse thing is that I cannot use the pagination!
Now, since the structure of the different querysets is the same, I'm wandering if there is a way to join the 4 querysets and give the template only one queryset.

You can make those four queries and then chain them inside your program:
result = itertools.chain(qs1, qs2, qs3, qs4)
but this doesn't seem to nice because your have to make for queries.
You can also write your own sql using raw sql, for example:
Serials.objects.raw(sql_string)
Also look at this:
How to combine 2 or more querysets in a Django view?

You should also be able to do qs1 | qs2 | qs3 | qs4. This will give you duplicates, however.
What you might want to look into is Q() objects:
from django.db.models import Q
value = "new"
Serials.objects.filter(Q(name__startswith=value) |
Q(code__startswith=value) |
Q(name__contains=value) |
Q(code__contains=value).distinct()
I'm not sure if it will handle the ordering if you do it this way, as this would rely on the db doing that.
Indeed, even using qs1 | qs2 may cause the order to be determined by the db. That might be the drawback (and reason why you might need at least two queries).

Related

Filter multiple Django model fields with variable number of arguments

I'm implementing search functionality with an option of looking for a record by matching multiple tables and multiple fields in these tables.
Say I want to find a Customer by his/her first or last name, or by ID of placed Order which is stored in different model than Customer.
The easy scenario which I already implemented is that a user only types single word into search field, I then use Django Q to query Order model using direct field reference or related_query_name reference like:
result = Order.objects.filter(
Q(customer__first_name__icontains=user_input)
|Q(customer__last_name__icontains=user_input)
|Q(order_id__icontains=user_input)
).distinct()
Piece of a cake, no problems at all.
But what if user wants to narrow the search and types multiple words into search field.
Example: user has typed Bruce and got a whole lot of records back as a result of search.
Now he/she wants to be more specific and adds customer's last name to search.So the search becomes Bruce Wayne, after splitting this into separate parts I'm having Bruce and Wayne. Obviously I don't want to search Orders model because order_id is a single-word instance and it's sufficient to find customer at once so for this case I'm dropping it out of query at all.
Now I'm trying to match customer by both first AND last name, I also want to handle the scenario where the order of provided data is random, to properly handle Bruce Wayne and Wayne Bruce, meaning I still have customers full name but the position of first and last name aren't fixed.
And this is the question I'm looking answer for: how to build query that will search multiple fields of model not knowing which of search words belongs to which table.
I'm guessing the solution is trivial and there's for sure an elegant way to create such a dynamic query, but I can't think of a way how.

You can dynamically OR a variable number of Q objects together to achieve your desired search. The approach below makes it trivial to add or remove fields you want to include in the search.
from functools import reduce
from operator import or_
fields = (
'customer__first_name__icontains',
'customer__last_name__icontains',
'order_id__icontains'
)
parts = []
terms = ["Bruce", "Wayne"] # produce this from your search input field
for term in terms:
for field in fields:
parts.append(Q(**{field: term}))
query = reduce(or_, parts)
result = Order.objects.filter(query).distinct()
The use of reduce combines the Q objects by ORing them together. Credit to that part of the answer goes to this answer.

The solution I came up with is rather complex, but it works exactly the way I wanted to handle this problem:
search_keys = user_input.split()
if len(search_keys) > 1:
first_name_set = set()
last_name_set = set()
for key in search_keys:
first_name_set.add(Q(customer__first_name__icontains=key))
last_name_set.add(Q(customer__last_name__icontains=key))
query = reduce(and_, [reduce(or_, first_name_set), reduce(or_, last_name_set)])
else:
search_fields = [
Q(customer__first_name__icontains=user_input),
Q(customer__last_name__icontains=user_input),
Q(order_id__icontains=user_input),
]
query = reduce(or_, search_fields)
result = Order.objects.filter(query).distinct()

Django ORM: django aggregate over filtered reverse relation

The question is remotely related to Django ORM: filter primary model based on chronological fields from related model, by further limiting the resulting queryset.
The models
Assuming we have the following models:
class Patient(models.Model)
name = models.CharField()
# other fields following
class MedicalFile(model.Model)
patient = models.ForeignKey(Patient, related_name='files')
issuing_date = models.DateField()
expiring_date = models.DateField()
diagnostic = models.CharField()
The query
I need to select all the files which are valid at a specified date, most likely from the past. The problem that I have here is that for every patient, there will be a small overlapping period where a patient will have 2 valid files. If we're querying for a date from that small timeframe, I need to select only the most recent file.
More to the point: consider patient John Doe. he will have string of "uninterrupted" files starting with 2012 like this:
+---+------------+-------------+
|ID |issuing_date|expiring_date|
+---+------------+-------------+
|1 |2012-03-06 |2013-03-06 |
+---+------------+-------------+
|2 |2013-03-04 |2014-03-04 |
+---+------------+-------------+
|3 |2014-03-04 |2015-03-04 |
+---+------------+-------------+
As one can easily observe, there is an overlap of couple of days of the validity of these files. For instance, in 2013-03-05 the files 1 and 2 are valid, but we're considering only file 2 (as the most recent one). I'm guessing that the use case isn't special: this is the case of managing subscriptions, where in order to have a continuous subscription, you will renew your subscription earlier.
Now, in my application I need to query historical data, e.g. give me all the files which where valid at 2013-03-05, considering only the "most recent" ones. I was able to solve this by using RawSQL, but I would like to have a solution without raw SQL. In the previous question, we were able to filter the "latest" file by aggregation over the reverse relation, something like:
qs = MedicalFile.objects.annotate(latest_file_date=Max('patient__files__issuing_date'))
qs = qs.filter(issuing_date=F('latest_file_date')).select_related('patient')
The problem is that we need to limit the range over which latest_file_date is computed, by filtering against 2013-03-05. But aggregate function don't run over filtered querysets ...
The "poor" solution
I'm currently doing this via an extra queryset clause (substitute "app" with your concrete application):
reference_date = datetime.date(year=2013, month=3, day=5)
annotation_latest_issuing_date = {
'latest_issuing_date': RawSQL('SELECT max(file.issuing_date) '
'FROM <app>_medicalfile file '
'WHERE file.person_id = <app>_medicalfile.person_id '
' AND file.issuing_date <= %s', (reference_date, ))
}
qs = MedicalFile.objects.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date)
qs = qs.extra(**annotation_latest_issuing_date).filter(issuing_date=F('latest_issuing_date'))
Writen as such, the queryset returns correct number of records.
Question: how can it be achieved without RaWSQL and (already implied) with the same performance level ?

You can use id__in and provide your nested filtered queryset (like all files that are valid at the given date).
qs = MedicalFile.objects
.filter(id__in=self.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date))
.order_by('patient__pk', '-issuing_date')
.distinct('patient__pk') # field_name parameter only supported by Postgres
The order_by groups the files by patient, with the latest issuing date first. distinct then retrieves that first file for each patient. However, general care is required when combining order_by and distinct: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Edit: Removed single patient dependence from first filter and changed latest to combination of order_by and distinct

Consider p is a Patient class instance.
I think you can do someting like:
p.files.filter(issue_date__lt='some_date', expiring_date__gt='some_date')
See https://docs.djangoproject.com/en/1.9/topics/db/queries/#backwards-related-objects
Or maybe with the Q magic query object...

optimal django manytomany query

I'm having trouble reducing the number of queries for a particular view. It's a fairly heavy one but I'm sure it can be reduced:
Profile:
name = CharField()
Officers:
club= ManyToManyField(Club, related_name='officers')
title= CharField()
Club:
name = CharField()
members = ManyToManyField(Profile)
Election:
club = ForeignKey(Club)
elected = ForeignKey(Profile)
title= CharField()
when = DateTimeField()
Clubs have members and officers (president, tournament director). People can be members of multiple clubs etc...
Officers are elected at elections, the results of which are stored.
Given a player how can I find out the most recently elected officer at each of the players clubs?
At the moment I have
clubs = Club.objects.filter(members=me).prefetch_related('officers')
for c in clubs:
officers = c.officers.all()
most_recent = Elections.objects.filter(club=c).filter(elected__in=officers).order_by('-when')[:1].get()
print(c.name + ' elected ' + most_recent.name + ' most recently')
Problem is the looped query, it's nice and fast if you're a member of 1 club but if you join fifty my database crawls.
Edit:
The answer from Nil does what I want but doesn't get the object. I don't really need the object but I do need another field as well as the datetime. If it's helpful the query:
Club.objects.annotate(last_election=Max('election__when'))
produces the raw SQL
SELECT "organisation_club"."id", "organisation_club"."name", MAX("organisation_election"."when") AS "last_election"
FROM "organisation_club"
LEFT OUTER JOIN "organisation_election" ON ( "organisation_club"."id" = "organisation_election"."club_id" )
GROUP BY "organisation_club"."id", "organisation_club"."name"
I'd really like an ORM answer if at all possible (or a 'mostly' ORM answer).

I believe this is what you're looking for:
from django.db.models import Max, F
Election.objects.filter(club__members=me) \
.annotate(max_date=Max('club__election_set__when')) \
.filter(when=F('max_date')).select_related('elected')
Relations can be followed forwards and backwards again in a single statement, allowing you to annotate the max_date for any election related to the club of the current election. The F class allows you to filter a queryset based on selected fields in SQL, including any extra fields added through annotation, aggregation, joins etc.

What you want is defined here in SQL term: query the Election table, group them by Club and keep only the last election of each club.
Now, how can we translate that in Django ORM? Looking at the documentation, we learn that we can do it with an annotation. The trick is that you need to think in reverse. You want to annotate (add a new data) each club with its last election. This gives us:
Club.objects.annotate(last_election=Max('election__when'))
# Use it in a for loop like that
for club in Club.objects.annotate(last_election=Max('election__when')):
print(club, club.last_election)
Sadly, this only adds the date, which doesn't answer your question! You want the name or the complete Club object. I searched and I still don't know how to do it properly. If everything fails though, you can still use a raw SQL query in Django using a query like in the first link.

The simplest way I can think of is filtering partially at the application level
If you do
e = Election.objects.filter(club__members=me).select_related('elected')
or
e = me.club_set.election_set.select_related('elected')
This is a single query and it should get back all the elections that happened for the all the clubs that the member me is in. Then you can use python to just get the most recent date. Of course, if you have many elections per club, you end up fetching much more data than will be used.
Another way which should do it in two queries:
# Get all member's clubs & most recent election
clubs = Club.objects.filter(members=me).annotate(last_election=Max('election__when'))
# Create filters for election based on the club id and the latest election time
election_Q = [Q(club__id=c.id) & Q(when=c.last_election) for c in clubs]
# Combine filters with an OR
election_filter = reduce(lambda f1, f2: f1 | f2, election_Q)
# Get elections restricting by specific clubs & election date
elections = Election.objects.filter(election_filter).select_related('elected')
for e in elections:
print '%s elected %s most recently at %s' % (e.club.name, e.elected, e.when)
This builds upon #Nil's method and uses its result to build a query in python, then feeds it into the second query. However, there is a limit with the size of a SQL statement and if there are a lot of clubs that a member is in, then you may hit the limit. The limit is fairly high though and I've only ever reached it when importing large datasets in a single INSERT statement so I think it should be fine for your purpose.
Sorry I cannot think of a way that the Django ORM can link them together using a single SQL query. The Django ORM is actually quite limited for complex queries so if you really need the efficiency I think it's probably best to write the raw SQL query.

Select DISTINCT individual columns in django?

I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.

One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()

It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4

User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()

The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.

Django DB, finding Categories whose Items are all in a subset

I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.

Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.

lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module

I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js