In views.py
I want to randomly choose one record with my filter:
a=Entry.objects.filter(first_name__contains='Br')).order_by('?')[0]
b=a.id
c=Entry.objects.filter(first_name__contains='Br')).order_by('?')[0]
d=c.id
It is possible that b and d are same.
But my goal is to get each time different entry object and id. How can I do this?
How about fetching both objects in the same query? That way you know you have two distinct entries.
a, c = Entry.objects.filter(first_name__contains='Br')).order_by('?')[0:2]
b = a.id
d = c.id
Note that this will raise a ValueError if the filter matches fewer than two entries.
Related
class A(models.Model)
results = models.TextField()
class B(models.Model)
name = models.CharField(max_length=20)
res = models.ManyToManyField(A)
Let's suppose we have above 2 models. A model has millions of objects.
I would like to know what would be the best efficient/fastest way to get all the results objects of a particular B object.
Let's suppose we have to retrieve all results for object number 5 of B
Option 1 : A.objects.filter(b__id=5)
(OR)
Option 2 : B.objects.get(id=5).res.all()
Option 1: My Question is filtering by id on A model objects would take lot of time? since there are millions of A model objects.
Option 2: Question: does res field on B model stores the id value of A model objects?
The reason why I'm assuming the option 2 would be a faster way since it stores the reference of A model objects & directly getting those object values first and making the second query to fetch the results. whereas in the first option filtering by id or any other field would take up a lot of time
The first expression will result in one database query. Indeed, it will query with:
SELECT a.*
FROM a
INNER JOIN a_b ON a_b.a_id = a.id
WHERE a_b.b_id = 5
The second expression will result in two queries. Indeed, first Django will query to fetch that specific B object with a query like:
SELECT b.*
FROM b
WHERE b.id = 5
then it will make exactly the same query to retrieve the related A objects.
But retrieving the A object is here not necessary (unless you of course need it somewhere else). You thus make a useless database query.
My Question is filtering by id on A model objects would take lot of time? since there are millions of A model objects.
A database normally stores an index on foreign key fields. This thus means that it will filter effectively. The total number of A objects is usually not (that) relevant (since it uses a datastructure to accelerate search like a B-tree [wiki]). The wiki page has a section named An index speeds the search that explains how this works.
I want to efficiently annotate Model A objects based on some fields on model B which has a plain many-to-many relationship (not using a through model) to A. A wrinkle is that I must find the oldest B for each A (using B.created_timestamp) but then populate using B.name. I want to use the ORM not raw SQL.
I tried this but it's not correct:
a_qs = A.objects.filter(id__in=ids)
ordered_qs = a_qs.order_by('-b__created_timestamp')
oldest_qs = Subquery(ordered_qs.values('b__name')[:1])
result = list(a_qs.annotate(name=oldest_qs))
This annotates every A with the same oldest name of B across all Bs related to A, but I want the oldest B among associated Bs for each A.
You forgot to set OuterRef https://docs.djangoproject.com/en/2.2/ref/models/expressions/
b_qs = B.objects.filter(a=OuterRef('pk')).order_by('-created_timestamp')
a_qs = A.objects.filter(id__in=ids).annotate(oldest_name=Subquery(b_qs.values('name')[:1])
result = list(a_qs)
I have the following queryset:
photos = Photo.objects.all()
I filter out two queries:
a = photos.filter(gallery__name='NGL')
b = photos.filter(gallery__name='NGA')
I add them together, and they form one new, bigger queryset:
c = a | b
Indeed, the length of a + b equals c:
a.count() + b.count() == c.count()
>>> True
So far so good. Yet, if I introduce a .annotate(), the | no longer seems to work:
a = photos.annotate(c=Count('label').exclude(c__lte=4)
b = photos.filter(painting=True)
c = a | b
a.count() + b.count() == c.count()
>>> False
How do I combine querysets, even when .annotate() is being used? Note that query one and two both work as intended in isolation, only when combining them using | does it seem to go wrong.
the pipe | or ampersand & to combine querysets actually puts OR or AND to SQL query so it looks like combined.
one = Photo.objects.filter(id=1)
two = Photo.objects.filter(id=2)
combined = one | two
print(combined.query)
>>> ... WHERE ("photo_photo"."id" = 1 OR "photo_photo"."id" = 2)...
But when you combine more filters and excludes you may notice it will give you strange results due to this. So that is why it doesn't match when you compare counts.
If you use .union() you have to have same columns with same data type, so you have to annotate both querysets. Info about .union()
SELECT statement within .UNION() must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
You have to keep in mind, that pythons argument kwargs for indefinite number of arguments are dictionary, so if you want to use annotate with multiple annotations, you can't ensure correct order of columns. Fortunatelly you can solve this with chaining annotate commands.
# printed query of this won't be consistent
photo_queryset.annotate(label_count=Count('labels'), tag_count=Count('tags'))
# this will always have same order of columns
photo_queryset.annotate(label_count=Count('labels')).annotate(tag_count=Count('tags'))
Then you can use .union() and it won't mess up results of annotation. Also .union() should be last method, because after .union() you can't use filter like methods. If you want to preserve duplicates, you use .union(qs, all=True) since .union() has default all=False and calls DISTINCT on queryset
photos = Photo.objects.annotate(c=Count('labels'))
one = photos.exclude(c__lte=4)
two = photos.filter(painting=True)
all = one.union(two, all=True)
one.count() + two.count() == all.count()
>>> True
then it should work like you described in question
Given a simple set of models as follows:
class A(models.Model):
pass
class B(models.Model):
parent = models.ForeignKey(A, related_name='b_set')
class C(models.Model):
parent = models.ForeignKey(B, related_name='c_set')
I am looking to create a query set of the A model with two annotations. One annotation should be the number of B rows that have the A row in question as their parent. The other annotation should denote the number of B rows, again with the A object in question as parent, which have at least n objects of type C in their c_set.
As an example, consider the following database and n = 3:
Table A
id
0
1
Table B
id parent
0 0
1 0
Table C
id parent
0 0
1 0
2 1
3 1
4 1
I'd like to be able to get a result of the form [(0, 2, 1), (1, 0, 0)] as the A object with id 0 has two B objects of which one has at least three related C objects. The A object with id 1 has no B objects and therefore also no B objects with at least three C rows.
The first annotation is trivial:
A.objects.annotate(annotation_1=Count('b_set'))
What I am trying to design now is the second annotation. I have managed to count the number of B rows per A where the B object has at least a single C object as follows:
A.objects.annotate(annotation_2=Count('b_set__c_set__parent', distinct=True))
But I cannot figure out a way to do it with a minimum related set size other than one. Hopefully someone here can point me in the right direction. One method I was thinking of was somehow annotating the B objects in the query instead of the A rows as is the default of the annotate method but I could not find any resources on this.
This is a complicated query at limits of Django 1.11. I decided to do it by two queries and to combine results to one list that can be used by a view like a queryset:
from django.db.models import Count
sub_qs = (
C.objects
.values('parent')
.annotate(c_count=Count('id'))
.order_by()
.filter(c_count__gte=n)
.values('parent')
)
qs = B.objects.filter(id__in=sub_qs).values('parent_id').annotate(cnt=Count('id'))
qs_map = {x['parent_id']: x['cnt'] for x in qs}
rows = list(A.objects.annotate(annotation_1=Count('b_set')))
for row in rows:
row.annotation_2 = qs_map.get(row.id, 0)
The list rows is the result. The more complicated qs.query is compiled to a relative simple SQL:
>>> print(str(qs.query))
SELECT app_b.parent_id, COUNT(app_b.id) AS cnt
FROM app_b
WHERE app_b.id IN (
SELECT U0.parent_id AS Col1 FROM app_c U0
GROUP BY U0.parent_id HAVING COUNT(U0.id) >= 3
)
GROUP BY app_b.parent_id; -- (added white space and removed double quotes)
This simple solution can be easier modified and tested.
Note: A solution by one query also exists, but doesn't seem useful. Why: It would require Subquery and OuterRef(). They are great, however in general Count() from aggregation is not supported by queries that are compiled together with join resolution. A subquery can be separated by lookup ...__in=... to can be compiled by Django, but then it is not possible to use OuterRef(). If it is written without OuterRef() then it is a so complicated not optimal nested SQL that the time complexity would be probably O(n2) by size of A table for many (or all) database backends. Not tested.
There are two models with a one to many relationship, A->{B}. I am counting how many records of A I have with the same B after using a filter(). Then I need to extract the top X records of A in terms of the most B records connected to them.
The current code:
class A(models.Model):
code = models.IntegerField()
...
class B(models.Model):
a = models.ForeignKey(A)
...
data = B.objects.all().filter(...)
top = data.values('a',...).annotate(n=Count('a')).distinct().order_by('-n')[:X];
I have ~300k B records and with my laptop this is taking ~2s for one query. I dissected the query into parts and timed it and it seems the main bottleneck is the annotate().
Is there any way whatsoever to do this faster with Django?
You should add .select_related('a') before annotate in the queryset. This will force django to join the models before counting them.
https://docs.djangoproject.com/en/1.9/ref/models/querysets/#select-related
I suspect the slow down is actually in the DISTINCT, rather than the count.
The way django builds up a query when using queryset.values(x).annotate(...) tells it to group by the first values, and then perform the aggregate.
B.objects.filter(...).values('a').annotate(n=Count('*')).order_by('-n')[:10]
That should generate SQL that looks something like:
SELECT b.a,
count(*) AS n
FROM b
GROUP BY (b.a)
ORDER BY count(*) DESC
LIMIT 10