django annotate queryset with field comparison result - django

I have a queryset like this:
predicts = Prediction.objects.select_related('match').filter(match_id=pk)
I need to annotate this with a new field is_correct. I need to compare two string fields and the result should be annotated in this new field. the fields that I want to compare are:
predict from Prediction table
result from Match table (that has been joined through select_related)
I need to know what expression should I put inside my annotate function; below I have my current code which throughs a TypeError exception:
predicts = predicts.annotate(is_correct=(F('predict') == F('result')))
all help will be greatly appreciated.
UPDATE:
I found an alternative solution that does the job for me (filtering the Prediction based on Match result using filter and exclude), but I still like to know how to address this specific case where the new annotated field is the result of the comparison between two other fields of the queryset. For those who may need it, in Django 2.2 and later the Nullif database function does a comparison between two fields.

You can use the extra function, a hook for injecting specific clauses into the SQL.
First of all, we must know the names of the apps and the models, or the name of the tables in the database.
Assuming that in your case, the two tables are called "app_prediction" and "app_match".
The sentence would be as follows:
Prediction.objects.select_related('match').extra(
select={'is_correct': "app_prediction.predict = app_match.result"}
)
This will add a field called is_correct in your result,
in the database, the fields and tables must be called in the same way.
It would be best to see the models.

Related

Problem with .only() method, passing to Pagination / Serialization --- all fields are getting returned instead of the ones specified in only()

I am trying load some data into datatables. I am trying to specify columns in the model.objects query by using .only() --- at first glance at the resulting QuerySet, it does in fact look like the mySQL query is only asking for those columns.
However, When I try to pass the QuerySet into Paginator, and/or a Serializer, the result has ALL columns in it.
I cannot use .values_list() because that does not return the nested objects that I need to have serialized as part of my specific column ask. I am not sure what is happening to my .only()
db_result_object = model.objects.prefetch_related().filter(qs).order_by(asc+sort_by).only(*columns_to_return)
paginated_results = Paginator(db_result_object,results_per_page)
serialized_results = serializer(paginated_results.object_list,many=True)
paginated_results.object_list = serialized_results.data
return paginated_results
This one has tripped me up too. In Django, calling only() doesn't return data equivalent to a SQL statement like this:
SELECT col_to_return_1, ... col_to_return_n
FROM appname_model
The reason it doesn't do it like this is because Django returns data to you not when you construct the QuerySet, but when you first access data from that QuerySet (see lazy QuerySets).
In the case of only() (a specific example of what is called a deferred field) you still get all of the fields like you normally would, but the difference is that it isn't completely loaded in from the database immediately. When you access the data, it will only load the fields included in the only statement. Some useful docs here.
My recommendation would be to write your Serializer so that it is only taking care of the one specific filed, likely using a SerializerMethodField with another serializer to serialize your related fields.

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

django valueslist queryset across database engines

In one of the django apps we use two database engine A and B, both are the same database but with different schemas. We have a table called C in both schemas but using db routing it's always made to point to database B. We have formed a valuelist queryset from one of the models in A, tried to pass the same in table C using filter condition __in but it always fetches empty though there are matching records. When we convert valueslist queryset to a list and use it in table C using filter condition __in it works fine.
Not working
data = modelindbA.objects.values_list('somecolumn',flat=True)
info = C.objects.filter(somecolumn__in=data).values_list
Working
data = modelindbA.objects.values_list('somecolumn',flat=True)
data = list(data)
info = C.objects.filter(somecolumn__in=data).values_list
I have read django docs and other SO questions, couldn't find anything relative. My guess is that since both models are in different database schemas the above is not working. I need assistance on how to troubleshoot this issue.
When you use a queryset with __in, Django will construct a single SQL query that uses a subquery for the __in clause. Since the two tables are in different databases, no rows will match.
By contrast, if you convert the first queryset to a list, Django will go ahead and fetch the data from the first database. When you then pass that data to the second query, hitting the second database, it will work as expected.
See the documentation for the in field lookup for more details:
You can also use a queryset to dynamically evaluate the list of values instead of providing a list of literal values.... This queryset will be evaluated as subselect statement:
SELECT ... WHERE blog.id IN (SELECT id FROM ... WHERE NAME LIKE '%Cheddar%')
Because values_list method returns django.db.models.query.QuerySet, not a list.
When you use it with same schema the orm optimise it and should make just one query, but when schemas are different it fails.
Just use list().
I would even recommend to use it for one schema since it can decrease complexity of query and work better on big tables.

Django annotate a field value to queryset

I want to attach a field value (id) to a QS like below, but Django throws a 'str' object has no attribute 'lookup' error.
Book.objects.all().annotate(some_id='somerelation__id')
It seems I can get my id value using Sum()
Book.objects.all().annotate(something=Sum('somerelation__id'))
I'm wondering is there not a way to simply annotate raw field values to a QS? Using sum() in this case doesn't feel right.
There are at least three methods of accessing related objects in a queryset.
using Django's double underscore join syntax:
If you just want to use the field of a related object as a condition in your SQL query you can refer to the field field on the related object related_object with related_object__field. All possible lookup types are listed in the Django documentation under Field lookups.
Book.objects.filter(related_object__field=True)
using annotate with F():
You can populate an annotated field in a queryset by refering to the field with the F() object. F() represents the field of a model or an annotated field.
Book.objects.annotate(added_field=F("related_object__field"))
accessing object attributes:
Once the queryset is evaluated, you can access related objects through attributes on that object.
book = Book.objects.get(pk=1)
author = book.author.name # just one author, or…
authors = book.author_set.values("name") # several authors
This triggers an additional query unless you're making use of select_related().
My advice is to go with solution #2 as you're already halfway down that road and I think it'll give you exactly what you're asking for. The problem you're facing right now is that you did not specify a lookup type but instead you're passing a string (somerelation_id) Django doesn't know what to do with.
Also, the Django documentation on annotate() is pretty straight forward. You should look into that (again).
You have <somerelation>_id "by default". For example comment.user_id. It works because User has many Comments. But if Book has many Authors, what author_id supposed to be in this case?

How do I use django's Q with django taggit?

I have a Result object that is tagged with "one" and "two". When I try to query for objects tagged "one" and "two", I get nothing back:
q = Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
print len(q)
# prints zero, was expecting 1
Why does it not work with Q? How can I make it work?
The way django-taggit implements tagging is essentially through a ManytoMany relationship. In such cases there is a separate table in the database that holds these relations. It is usually called a "through" or intermediate model as it connects the two models. In the case of django-taggit this is called TaggedItem. So you have the Result model which is your model and you have two models Tag and TaggedItem provided by django-taggit.
When you make a query such as Result.objects.filter(Q(tags__name="one")) it translates to looking up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has the name="one".
Trying to match for two tag names would translate to looking up up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has both name="one" AND name="two". You obviously never have that as you only have one value in a row, it's either "one" or "two".
These details are hidden away from you in the django-taggit implementation, but this is what happens whenever you have a ManytoMany relationship between objects.
To resolve this you can:
Option 1
Query tag after tag evaluating the results each time, as it is suggested in the answers from others. This might be okay for two tags, but will not be good when you need to look for objects that have 10 tags set on them. Here would be one way to do this that would result in two queries and get you the result:
# get the IDs of the Result objects tagged with "one"
query_1 = Result.objects.filter(tags__name="one").values('id')
# use this in a second query to filter the ID and look for the second tag.
results = Result.objects.filter(pk__in=query_1, tags__name="two")
You could achieve this with a single query so you only have one trip from the app to the database, which would look like this:
# create django subquery - this is not evaluated, but used to construct the final query
subquery = Result.objects.filter(pk=OuterRef('pk'), tags__name="one").values('id')
# perform a combined query using a subquery against the database
results = Result.objects.filter(Exists(subquery), tags__name="two")
This would only make one trip to the database. (Note: filtering on sub-queries requires django 3.0).
But you are still limited to two tags. If you need to check for 10 tags or more, the above is not really workable...
Option 2
Query the relationship table instead directly and aggregate the results in a way that give you the object IDs.
# django-taggit uses Content Types so we need to pick up the content type from cache
result_content_type = ContentType.objects.get_for_model(Result)
tag_names = ["one", "two"]
tagged_results = (
TaggedItem.objects.filter(tag__name__in=tag_names, content_type=result_content_type)
.values('object_id')
.annotate(occurence=Count('object_id'))
.filter(occurence=len(tag_names))
.values_list('object_id', flat=True)
)
TaggedItem is the hidden table in the django-taggit implementation that contains the relationships. The above will query that table and aggregate all the rows that refer either to the "one" or "two" tags, group the results by the ID of the objects and then pick those where the object ID had the number of tags you are looking for.
This is a single query and at the end gets you the IDs of all the objects that have been tagged with both tags. It is also the exact same query regardless if you need 2 tags or 200.
Please review this and let me know if anything needs clarification.
first of all, this three are same:
Result.objects.filter(tags__name="one", tags__name="two")
Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
Result.objects.filter(tags__name_in=["one"]).filter(tags__name_in=["two"])
i think the name field is CharField and no record could be equal to "one" and "two" at same time.
in python code the query looks like this(always false, and why you are geting no result):
from random import choice
name = choice(["abtin", "shino"])
if name == "abtin" and name == "shino":
we use Q object for implement OR or complex queries
Into the example that works you do an end on two python objects (query sets). That gets applied to any record not necessarily to the same record that has one AND two as tag.
ps: Why do you use the in filter ?
q = Result.objects.filter(tags_name_in=["one"]).filter(tags_name_in=["two"])
add .distinct() to remove duplicates if expecting more than one unique object