I have a filter which should return a queryset with 2 objects, and should have one different field. for example:
obj_1 = (name='John', age='23', is_fielder=True)
obj_2 = (name='John', age='23', is_fielder=False)
Both the objects are of same model, but different primary key. I tried usign the below filter:
qs = Model.objects.filter(name='John', age='23').annotate(is_fielder=F('plays__outdoor_game_role')=='Fielder')
I used annotate first time, but it gave me the below error:
TypeError: QuerySet.annotate() received non-expression(s): False.
I am new to Django, so what am I doing wrong, and what should be the annotate to get the required objects as shown above?
The solution by #ktowen works well, quite straightforward.
Here is another solution I am using, hope it is helpful too.
queryset = queryset.annotate(is_fielder=ExpressionWrapper(
Q(plays__outdoor_game_role='Fielder'),
output_field=BooleanField(),
),)
Here are some explanations for those who are not familiar with Django ORM:
Annotate make a new column/field on the fly, in this case, is_fielder. This means you do not have a field named is_fielder in your model while you can use it like plays.outdor_game_role.is_fielder after you add this 'annotation'. Annotate is extremely useful and flexible, can be combined with almost every other expression, should be a MUST-KNOWN method in Django ORM.
ExpressionWrapper basically gives you space to wrap a more complecated combination of conditions, use in a format like ExpressionWrapper(expression, output_field). It is useful when you are combining different types of fields or want to specify an output type since Django cannot tell automatically.
Q object is a frequently used expression to specify a condition, I think the most powerful part is that it is possible to chain the conditions:
AND (&): filter(Q(condition1) & Q(condition2))
OR (|): filter(Q(condition1) | Q(condition2))
Negative(~): filter(~Q(condition))
It is possible to use Q with normal conditions like below:
(Q(condition1)|id__in=[list])
The point is Q object must come to the first or it will not work.
Case When(then) can be simply explained as if con1 elif con2 elif con3 .... It is quite powerful and personally, I love to use this to customize an ordering object for a queryset.
For example, you need to return a queryset of watch history items, and those must be in an order of watching by the user. You can do it with for loop to keep the order but this will generate plenty of similar queries. A more elegant way with Case When would be:
item_ids = [list]
ordering = Case(*[When(pk=pk, then=pos)
for pos, pk in enumerate(item_ids)])
watch_history = Item.objects.filter(id__in=item_ids)\
.order_by(ordering)
As you can see, by using Case When(then) it is possible to bind those very concrete relations, which could be considered as 1) a pinpoint/precise condition expression and 2) especially useful in a sequential multiple conditions case.
You can use Case/When with annotate
from django.db.models import Case, BooleanField, Value, When
Model.objects.filter(name='John', age='23').annotate(
is_fielder=Case(
When(plays__outdoor_game_role='Fielder', then=Value(True)),
default=Value(False),
output_field=BooleanField(),
),
)
Related
I am using custom prefetch object to get only some related objects, ex:
unreleased_prefetch = Prefetch("chants", Chant.objects.with_audio())
teams = Team.objects.public().prefetch_related(unreleased_prefetch)
This works well, but I also want to know count of these objects and filter by these. I am happy that I can at the moment use queryset as parameter to Prefetch object (as I heavily use custom QuerySets/Managers).
Is there way how I can reuse this query, that I pass to Prefetch object same way with conditional annotate?
So far my conditional annotate is quite ugly and looks like this (it does same thing as my original chant with_audio custom query/filter):
.annotate(
unreleased_count=Count(Case(
When(chants__has_audio_versions=True, chants__has_audio=True, chants__flag_reject=False,
chants__active=False, then=1),
output_field=IntegerField()))
).filter(unreleased_count__gt=0)
It works, but is quite ugly and has duplicated logic.
Is there way to pass queryset to When same way I can pass it to prefetch to avoid duplications?
Not saying this is the best practice or anything, but wanted to provide a potential way of dealing with such a situation.
Let's say you have a ChantQuerySet class:
class ChantQuerySet(models.QuerySet):
def with_audio(self):
return self.filter(chants__has_audio_versions=True, chants__has_audio=True,
chants__flag_reject=False, chants__active=False)
Which you use as a manager doing something like below, probably:
class Chant(models.Model):
# ...
objects = ChantQuerySet.as_manager()
I would suggest storing the filter in the QuerySet:
from django.db.models import Q
class ChantQuerySet(models.QuerySet):
#property
def with_audio_filter(self):
return Q(chants__has_audio_versions=True, chants__has_audio=True,
chants__flag_reject=False, chants__active=False)
def with_audio(self):
return self.filter(self.with_audio_filter)
This gives you the ability to do this:
Chant.objects.annotate(
unreleased_count=Count(Case(
When(ChantQuerySet.with_audio_filter, then=1),
output_field=IntegerField()))
).filter(unreleased_count__gt=0)
Now you are able to change the filter only in one place, should you need to do so, without having to change it everywhere. To me it makes sense to store this filter in the QuerySet and personally I see nothing wrong with that, but that's just me.
One thing that I'd change though, is to either make the with_audio_filter property cached, or store it in a field in the constructor when initializing ChantQuerySet.
I am using Django, with mongoengine. I have a model Classes with an inscriptions list, And I want to get the docs that have an id in that list.
classes = Classes.objects.filter(inscriptions__contains=request.data['inscription'])
Here's a general explanation of querying ArrayField membership:
Per the Django ArrayField docs, the __contains operator checks if a provided array is a subset of the values in the ArrayField.
So, to filter on whether an ArrayField contains the value "foo", you pass in a length 1 array containing the value you're looking for, like this:
# matches rows where myarrayfield is something like ['foo','bar']
Customer.objects.filter(myarrayfield__contains=['foo'])
The Django ORM produces the #> postgres operator, as you can see by printing the query:
print Customer.objects.filter(myarrayfield__contains=['foo']).only('pk').query
>>> SELECT "website_customer"."id" FROM "website_customer" WHERE "website_customer"."myarrayfield_" #> ['foo']::varchar(100)[]
If you provide something other than an array, you'll get a cryptic error like DataError: malformed array literal: "foo" DETAIL: Array value must start with "{" or dimension information.
Perhaps I'm missing something...but it seems that you should be using .filter():
classes = Classes.objects.filter(inscriptions__contains=request.data['inscription'])
This answer is in reference to your comment for rnevius answer
In Django ORM whenever you make a Database call using ORM, it will generally return either a QuerySet or an object of the model if using get() / number if you are using count() ect., depending on the functions that you are using which return other than a queryset.
The result from a Queryset function can be used to implement further more refinement, like if you like to perform a order() or collecting only distinct() etc. Queryset are lazy which means it only hits the database when they are actually used not when they are assigned. You can find more information about them here.
Where as the functions that doesn't return queryset cannot implement such things.
Take time and go through the Queryset Documentation more in depth explanation with examples are provided. It is useful to understand the behavior to make your application more efficient.
Having the model:
class Notebook(models.Model):
n_id = models.AutoField(primary_key = True)
class Note(models.Model):
b_nbook = models.ForeignKey(Notebook)
the URL pattern passing one parameter:
(r'^(?P<n_id>\d+)/$', 'notebook_notes')
and the following view:
def notebook_notes(request, n_id):
nbook = get_object_or_404(Nbook, pk=n_id)
...
which of the following is the optimum query set, and why? (they both work and pass the notes based to a selected by URL notebook)
notes = nbook.note_set.filter(b_nbook = n_id)
notes = Note.objects.select_related().filter(b_nbook = n_id)
Well you're comparing apples and oranges a bit there. They may return virtually the same, but you're doing different things on both.
Let's take the relational version first. That query is saying get all the notes that belong to nbook. You're then filtering that queryset by only notes that belong to nbook. You're filtering it twice on the same criteria, in effect. Since Django's querysets are lazy, it doesn't really do anything bad, like hit the database multiple times, but it's still unnecessary.
Now, the second version. Here, you're starting with all notes and filtering to just those that belong to the particular notebook. There's only one filter this time, but it's bad form to do it this way. Since it's a relation, you should look it up through the relational format, i.e. nbook.note_set.all(). On this version, though, you're also using select_related(), which wasn't used on the other version.
select_related will attempt to create a join table with any other relations on the model, in this case a Note. However, since the only relation on Note is Notebook and you already have the notebook, it's redundant.
Taking out all the redundancy in those two version leaves you with just:
notes = nbook.note_set.all()
That, too, will return exactly the same results as the other two version, but is much cleaner and standardized.
The short of it is, the table names of all queries that are inside a filter get renamed to u0, u1, ..., so my extra where clauses won't know what table to point to. I would love to not have to hand-make all the queries for every way I might subselect on this data, and my current workaround is to turn my extra'd queries into pk values_lists, but those are really slow and something of an abomination.
Here's what this all looks like. You can mostly ignore the details of what goes in the extra of this manager method, except the first sql line which points to products_product.id:
def by_status(self, *statii):
return self.extra(where=["""products_product.id IN
(SELECT recent.product_id
FROM (
SELECT product_id, MAX(start_date) AS latest
FROM products_productstatus
GROUP BY product_id
) AS recent
JOIN products_productstatus AS ps ON ps.product_id = recent.product_id
WHERE ps.start_date = recent.latest
AND ps.status IN (%s))""" % (', '.join([str(stat) for stat in statii]),)])
Which works wonderfully for all the situations involving only the products_product table.
When I want these products as a subselect, i do:
Piece.objects.filter(
product__in=Product.objects.filter(
pk__in=list(
Product.objects.by_status(FEATURED).values_list('id', flat=True))))
How can I keep the generalized abilities of a query set, yet still use an extra where clause?
At first: the issue is not totally clear to me. Is the second code block in your question the actual code you want to execute? If this is the case the query should work as expected since there is no subselect performed.
I assume so that you want to use the second code block without the list() around the subselect to prevent a second query being performed.
The django documentation refers to this issue in the documentation about the extra method. However its not very easy to overcome this issue.
The easiest but most "hakish" solution is to observe which table alias is produced by django for the table you want to query in the extra method. You can rely on the persistent naming of this alias as long as you construct the query always in the same fashion (you don't change the order of multiple extra methods or filter calls that cause a join).
You can inspect a query that will be execute in the DB queryset by using:
print Model.objects.filter(...).query
This will reveal the aliases that are used for the tables you want to query.
As of Django 1.11, you should be able to use Subquery and OuterRef to generate an equivalent query to your extra (using a correlated subquery rather than a join):
def by_status(self, *statii):
return self.filter(
id__in=Subquery(ProductStatus.values("product_id").filter(
status__in=statii,
product__in=Subquery(ProductStatus.objects.values(
"product_id",
).annotate(
latest=Max("start_date"),
).filter(
latest=OuterRef("start_date"),
).values("product_id"),
),
)
You could probably do it with Window expressions as well (as of Django 2.0).
Note that this is untested, so may need some tweaks.
I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.