Is it possible to annotate/count against a prefetched query?
My initial query below, is based on circuits, then I realised that if a site does not have any circuits I won't have a 'None' Category which would show a site as Down.
conn_data = Circuits.objects.all() \
.values('circuit_type__circuit_type') \
.exclude(active_link=False) \
.annotate(total=Count('circuit_type__circuit_type')) \
.order_by('circuit_type__monitor_priority')
So I changed to querying sites and using prefetch, which now has an empty circuits_set for any site that does not have an active link. Is there a Django way of creating the new totals against that circuits_set within conn_data? I was going to loop through all the sites manually and add the totals that way but wanted to know if there was a way to do this within the QuerySet instead?
my end result should have a something like:
[
{'circuit_type__circuit_type': 'Fibre', 'total': 63},
{'circuit_type__circuit_type': 'DSL', 'total': 29},
{'circuit_type__circuit_type': 'None', 'total': 2}
]
prefetch query:
conn_data = SiteData.objects.prefetch_related(
Prefetch(
'circuits_set',
queryset=Circuits.objects.exclude(
active_link=False).select_related('circuit_type'),
)
)
I don't think this will work. Its debatable whether it should work. Let's refer to what prefetch_related does.
Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.
So what happens here is that two queries are dispatched and two lists are realized. These lists are then partitioned in memory and grouped to the correct parent records.
Count() and annotate() are directives to the DBMS that resolve to SQL
Select Count(id) from conn_data
Because of the way annotate and prefetch_related work I think its unlikely they will play nice together. prefetch_related is just a convenience though. From a practical perspective running two separate ORM queries and assigning them to SiteData records yourself is effectively the same thing. So something like ...
#Gets all Circuits counted and grouped by SiteData
Circuits.objects.values('sitedata_id)'.exclude(active_link=False).select_related('circuit_type').annotate(Count('site_data_id'));
Then you just loop over your SiteData records and assign the counts.
Ok I got what I wanted with this, probably a better way of doing it but it works never the less:
from collections import Counter
import operator
class ConnData(object):
def __init__(self, priority='', c_type='', count=0 ):
self.priority = priority
self.c_type = c_type
self.count = count
def __repr__(self):
return '{} {}'.format(self.__class__.__name__, self.c_type)
# get all the site data
conn_data = SiteData.objects.exclude(Q(site_type__site_type='Data Centre') | Q(site_type__site_type='Factory')) \
.prefetch_related(
Prefetch(
'circuits_set',
queryset=Circuits.objects.exclude(active_link=False).select_related('circuit_type'),
)
)
# create a list for the conns
conns = []
# add items to list of dictionaries with all required fields
for conn in conn_data:
try:
conn_type = conn.circuits_set.all()[0].circuit_type.circuit_type
prioritiy = conn.circuits_set.all()[0].circuit_type.monitor_priority
conns.append({'circuit_type' : conn_type, 'priority' : prioritiy})
except:
# create category for down sites
conns.append({'circuit_type' : 'Down', 'priority' : 10})
# crate new list for class data
conn_counts = []
# create counter data
conn_count_data = Counter(((d['circuit_type'], d['priority']) for d in conns))
# loop through counter data and add classes to list
for val, count in conn_count_data.items():
cc = ConnData()
cc.priority = val[1]
cc.c_type = val[0]
cc.count = count
conn_counts.append(cc)
# sort the classes by priority
conn_counts = sorted(conn_counts, key=operator.attrgetter('priority'))
Related
Context
I really want to, but I don't understand how I can limit an already existing Prefetch object
Models
class MyUser(AbstractUser):
pass
class Absence(Model):
employee = ForeignKey(MyUser, related_name='absences', on_delete=PROTECT)
start_date = DateField()
end_date = DateField()
View
class UserAbsencesListAPIView(ListAPIView):
queryset = MyUser.objects.order_by('first_name')
serializer_class = serializers.UserWithAbsencesSerializer
filterset_class = filters.UserAbsencesFilterSet
Filter
class UserAbsencesFilterSet(FilterSet):
first_name = CharFilter(lookup_expr='icontains', field_name='first_name')
from_ = DateFilter(method='filter_from', distinct=True)
to = DateFilter(method='filter_to', distinct=True)
What do I need
With the Request there are two arguments from_ and to. I should return Users with their Absences, which (Absences) are bounded by from_ and/or to intervals. It's very simple for a single argument, i can limit the set using Prefetch object:
def filter_from(self, queryset, name, value):
return queryset.prefetch_related(
Prefetch(
'absences',
Absence.objects.filter(Q(start_date__gte=value) | Q(start_date__lte=value, end_date__gte=value)),
)
)
Similarly for to.
But what if I want to get a limit by two arguments at once?
When the from_ attribute is requested - 'filter_from' method is executed; for the to argument, another method filter_to is executed.
I can't use prefetch_related twice, I get an exception ValueError: 'absences' lookup was already seen with a different queryset. You may need to adjust the ordering of your lookups..
I've tried using to_attr, but it looks like I can't access it in an un-evaluated queryset.
I know that I can find the first defined Prefetch in the _prefetch_related_lookups attribute of queryset, but is there any way to apply an additional filter to it or replace it with another Prefetch object so that I can end up with a query similar to:
queryset.prefetch_related(
Prefetch(
'absences',
Absence.objects.filter(
Q(Q(start_date__gte=from_) | Q(start_date__lte=from_, end_date__gte=from_))
& Q(Q(end_date__lte=to) | Q(start_date__lte=to, end_date__gte=to))
),
)
)
django-filter seems to have its own built-in filter for range queries:
More info here and here
So probably just easier to use that instead:
def filter_date_range(self, queryset, name, value):
if self.lookup_expr = "range":
#return queryset with specific prefetch
if self.lookup_expr = "lte":
#return queryset with specific prefetch
if self.lookup_expr = "gte":
#return queryset with specific prefetch
I haven't tested this and you may need to play around with the unpacking of value but it should get you most of the way there.
I have a model which contains various fields like name, start_time, start_date. I want to create an API that reads my model and displays all the duplicated records.
The above answer works, but I just wanna post my code as well so that other newbies won't have to look for other answers.
from django.db.models import Count
def find_duplicates(request):
if request.method == 'GET':
duplicates = Model.objects.values('field1', 'field2', 'field3')\
.annotate(field1_count=Count('field1'),
field2_count=Count('field2'),
field3_count=Count('field3')
) \
.filter(field1_count__gt=1,
field2_count__gt=1,
field3_count__gt=1
)
duplicate_objects = Model.objects.filter(field1__in=[item['field1'] for item in duplicates],
field2__in=[item['field2'] for item in duplicates],
field3__in=[item['field3'] for item in duplicates],
)
serializer = ModelSerializer(duplicate_objects, many=True)
return Response(serializer.data, status=status.HTTP_200_OK)
P.S. field1 is a foreign key therefore I am extracting the id here.
As far as I know, this needs to be done as a two step process. You find the fields that have duplicates in one query, and then gather all those objects in a second query.
from django.db.models import Count
duplicates = MyModel.objects.values('name') \
.annotate(name_count=Count('id')) \
.filter(name_count__gt=1)
duplicate_objects = MyModel.objects.filter(name__in=[item['name'] for item in duplicates])
duplicates will contain the names that have more than one occurrence, while duplicate_objects will contain all duplicate named objects.
Of course, I don't mean to do what prefetch_related does already.
I'd like to mimic what it does.
What I'd like to do is the following.
I have a list of MyModel instances.
A user can either follows or doesn't follow each instance.
my_models = MyModel.objects.filter(**kwargs)
for my_model in my_models:
my_model.is_following = Follow.objects.filter(user=user, target_id=my_model.id, target_content_type=MY_MODEL_CTYPE)
Here I have n+1 query problem, and I think I can borrow what prefetch_related does here. Description of prefetch_related says, it performs the query for all objects and when the related attribute is required, it gets from the pre-performed queryset.
That's exactly what I'm after, perform query for is_following for all objects that I'm interested in. and use the query instead of N individual query.
One additional aspect is that, I'd like to attach queryset rather than attach the actual value, so that I can defer evaluation until pagination.
If that's too ambiguous statement, I'd like to give the my_models queryset that has is_following information attached, to another function (DRF serializer for instance).
How does prefetch_related accomplish something like above?
A solution where you can get only the is_following bit is possible with a subquery via .extra.
class MyModelQuerySet(models.QuerySet):
def annotate_is_follwing(self, user):
return self.extra(
select = {'is_following': 'EXISTS( \
SELECT `id` FROM `follow` \
WHERE `follow`.`target_id` = `mymodel`.id \
AND `follow`.`user_id` = %s)' % user.id
}
)
class MyModel(models.Model):
objects = MyModelQuerySet.as_manager()
usage:
my_models = MyModel.objects.filter(**kwargs).annotate_is_follwing(request.user)
Now another solution where you can get a whole list of following objects.
Because you have a GFK in the Follow class you need to manually create a reverse relation via GenericRelation. Something like:
class MyModelQuerySet(models.QuerySet):
def with_user_following(self, user):
return self.prefetch_related(
Prefetch(
'following',
queryset=Follow.objects.filter(user=user) \
.select_related('user'),
to_attr='following_user'
)
)
class MyModel(models.Model):
following = GenericRelation(Follow,
content_type_field='target_content_type',
object_id_field='target_id'
related_query_name='mymodels'
)
objects = MyModelQuerySet.as_manager()
def get_first_following_object(self):
if hasattr(self, 'following_user') and len(self.following_user) > 0:
return self.following_user[0]
return None
usage:
my_models = MyModel.objects.filter(**kwargs).with_user_following(request.user)
Now you have access to following_user attribute - a list with all follow objects per mymodel, or you can use a method like get_first_following_object.
Not sure if this is the best approach, and I doubt this is what prefetch_related does because I'm joining here.
I found there's way to select extra columns in your query.
extra_select = """
EXISTS(SELECT * FROM follow_follow
WHERE follow_follow.target_object_id = myapp_mymodel.id AND
follow_follow.target_content_type_id = %s AND
follow_follow.user_id = %s)
"""
qs = self.extra(
select={'is_following': extra_select},
select_params=[CONTENT_TYPE_ID, user.id]
)
So you can do this with join.
prefetch_related way of doing it would be separate queryset and look it up in queryset for the attribute.
The code:
now = datetime.now()
year_ago = now - timedelta(days=365)
category_list = Category.objects.annotate(suma = Sum('operation__value')) \
.filter(operation__date__gte = year_ago) \
.annotate(podsuma = Sum('operation__value'))
The idea: get sum of each category and sum of one year back.
But this code result only filtered objects; suma is equal to podsuma.
A queryset only produces one query, so all annotations are calculated over the same filtered data set. You'll need to do two queries.
Update:
Do something like this:
In models.py
class Category(models.Model):
...
def suma(self):
return ...
def podsuma(self):
return ...
Then remove the annotations and your for loop should work as is. It'll mean a lot more queries, but they'll be simpler, and you can always cache them.
In mysql, you can insert multiple rows to a table in one query for n > 0:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9), ..., (n-2, n-1, n);
Is there a way to achieve the above with Django queryset methods? Here's an example:
values = [(1, 2, 3), (4, 5, 6), ...]
for value in values:
SomeModel.objects.create(first=value[0], second=value[1], third=value[2])
I believe the above is calling an insert query for each iteration of the for loop. I'm looking for a single query, is that possible in Django?
These answers are outdated. bulk_create has been brought in Django 1.4:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create
I recently looked for such a thing myself (inspired by QuerySet.update(), as I imagine you are too). To my knowledge, no bulk create exists in the current production framework (1.1.1 as of today). We ended up creating a custom manager for the model that needed bulk-create, and created a function on that manager to build an appropriate SQL statement with the sequence of VALUES parameters.
Something like (apologies if this does not work... hopefully I've adapted this runnably from our code):
from django.db import models, connection
class MyManager(models.Manager):
def create_in_bulk(self, values):
base_sql = "INSERT INTO tbl_name (a,b,c) VALUES "
values_sql = []
values_data = []
for value_list in values:
placeholders = ['%s' for i in range(len(value_list))]
values_sql.append("(%s)" % ','.join(placeholders))
values_data.extend(value_list)
sql = '%s%s' % (base_sql, ', '.join(values_sql))
curs = connection.cursor()
curs.execute(sql, values_data)
class MyObject(models.Model):
# model definition as usual... assume:
foo = models.CharField(max_length=128)
# custom manager
objects = MyManager()
MyObject.objects.create_in_bulk( [('hello',), ('bye',), ('c', )] )
This approach does run the risk of being very specific to a particular database. In our case, we wanted the function to return the IDs just created, so we had a postgres-specific query in the function to generate the requisite number of IDs from the primary key sequence for the table that represents the object. That said, it does perform significantly better in tests versus iterating over the data and issuing separate QuerySet.create() statements.
Here is way to do batch inserts that still goes through Django's ORM (and thus retains the many benefits the ORM provides). This approach involves subclassing the InsertQuery class as well as creating a custom manager that prepares model instances for insertion into the database in much the same way that Django's save() method uses. Most of the code for the BatchInsertQuery class below is straight from the InsertQuery class, with just a few key lines added or modified. To use the batch_insert method, pass in a set of model instances that you want to insert into the database. This approach frees up the code in your views from having to worry about translating model instances into valid SQL values; the manager class in conjunction with the BatchInsertQuery class handles that.
from django.db import models, connection
from django.db.models.sql import InsertQuery
class BatchInsertQuery( InsertQuery ):
####################################################################
def as_sql(self):
"""
Constructs a SQL statement for inserting all of the model instances
into the database.
Differences from base class method:
- The VALUES clause is constructed differently to account for the
grouping of the values (actually, placeholders) into
parenthetically-enclosed groups. I.e., VALUES (a,b,c),(d,e,f)
"""
qn = self.connection.ops.quote_name
opts = self.model._meta
result = ['INSERT INTO %s' % qn(opts.db_table)]
result.append('(%s)' % ', '.join([qn(c) for c in self.columns]))
result.append( 'VALUES %s' % ', '.join( '(%s)' % ', '.join(
values_group ) for values_group in self.values ) ) # This line is different
params = self.params
if self.return_id and self.connection.features.can_return_id_from_insert:
col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
r_fmt, r_params = self.connection.ops.return_insert_id()
result.append(r_fmt % col)
params = params + r_params
return ' '.join(result), params
####################################################################
def insert_values( self, insert_values ):
"""
Adds the insert values to the instance. Can be called multiple times
for multiple instances of the same model class.
Differences from base class method:
-Clears self.columns so that self.columns won't be duplicated for each
set of inserted_values.
-appends the insert_values to self.values instead of extends so that
the values (actually the placeholders) remain grouped separately for
the VALUES clause of the SQL statement. I.e., VALUES (a,b,c),(d,e,f)
-Removes inapplicable code
"""
self.columns = [] # This line is new
placeholders, values = [], []
for field, val in insert_values:
placeholders.append('%s')
self.columns.append(field.column)
values.append(val)
self.params += tuple(values)
self.values.append( placeholders ) # This line is different
########################################################################
class ManagerEx( models.Manager ):
"""
Extended model manager class.
"""
def batch_insert( self, *instances ):
"""
Issues a batch INSERT using the specified model instances.
"""
cls = instances[0].__class__
query = BatchInsertQuery( cls, connection )
for instance in instances:
values = [ (f, f.get_db_prep_save( f.pre_save( instance, True ) ) ) \
for f in cls._meta.local_fields ]
query.insert_values( values )
return query.execute_sql()
########################################################################
class MyModel( models.Model ):
myfield = models.CharField(max_length=255)
objects = ManagerEx()
########################################################################
# USAGE:
object1 = MyModel(myfield="foo")
object2 = MyModel(myfield="bar")
object3 = MyModel(myfield="bam")
MyModels.objects.batch_insert(object1,object2,object3)
You might get the performance you need by doing manual transactions. What this will allow you to do is to create all the inserts in one transaction, then commit the transaction all at once. Hopefully this will help you: http://docs.djangoproject.com/en/dev/topics/db/transactions/
No it is not possible because django models are objects rather than a table. so table actions are not applicable to django models. and django creates an object then inserts data in to the table therefore you can't create multiple object in one time.