So I have a database of books, and I want to search it based on filters and keywords so I've overridden the get_queryset method in my BookSearch view:
class BookSearch(generics.ListAPIView):
serializer_class = ProductDetailViewSerializer
model = ProductDetailView
def get_queryset(self):
queryset = None
categories = self.kwargs['categories'].rstrip()
keywords = self.kwargs['keywords'].rstrip()
if isinstance(categories, str) and isinstance(keywords, str):
book_filter = BookFilter(categories)
sql = self.get_sql(categories, keywords, book_filter)
queryset = ProductDetailView.objects.filter(
id__in=RawSQL(sql, book_filter.params)
)
message = f"{queryset.query}"
log_to_file('BookSearch.log', 'BookSearch.get_queryset', message)
return queryset
That log_to_file call logs the query that django uses, which I've abbreviated here
but is as follows:
SELECT `jester_productdetailview`.`id`,
`jester_productdetailview`.`isbn`,
`jester_productdetailview`.`title`
FROM `jester_productdetailview`
WHERE `jester_productdetailview`.`id` IN (
select id from jester_productdetailview
where ( authors like '%Beatrix%' or
illustrators like '%Beatrix%' or
title like '%Beatrix%' ) )
ORDER BY `jester_productdetailview`.`title` ASC
If I run that query in my database manually, I get 186 rows:
'119371','9780723259572','A Beatrix Potter Treasury'
'130754','9780241293348','A Christmas Wish'
'117336','9780241358740','A Pumpkin for Peter' ...
To get the query above, I call the view through the API, yet by the time the queryset is returned, there are no results ???
http://127.0.0.1:8000/api/book-search/{"filter": "all"}/Beatrix/
returns []
You are returning queryset only within the if condition only. By default, the function is sending None. Return the queryset outside the if condition as well.
Related
Context
I really want to, but I don't understand how I can limit an already existing Prefetch object
Models
class MyUser(AbstractUser):
pass
class Absence(Model):
employee = ForeignKey(MyUser, related_name='absences', on_delete=PROTECT)
start_date = DateField()
end_date = DateField()
View
class UserAbsencesListAPIView(ListAPIView):
queryset = MyUser.objects.order_by('first_name')
serializer_class = serializers.UserWithAbsencesSerializer
filterset_class = filters.UserAbsencesFilterSet
Filter
class UserAbsencesFilterSet(FilterSet):
first_name = CharFilter(lookup_expr='icontains', field_name='first_name')
from_ = DateFilter(method='filter_from', distinct=True)
to = DateFilter(method='filter_to', distinct=True)
What do I need
With the Request there are two arguments from_ and to. I should return Users with their Absences, which (Absences) are bounded by from_ and/or to intervals. It's very simple for a single argument, i can limit the set using Prefetch object:
def filter_from(self, queryset, name, value):
return queryset.prefetch_related(
Prefetch(
'absences',
Absence.objects.filter(Q(start_date__gte=value) | Q(start_date__lte=value, end_date__gte=value)),
)
)
Similarly for to.
But what if I want to get a limit by two arguments at once?
When the from_ attribute is requested - 'filter_from' method is executed; for the to argument, another method filter_to is executed.
I can't use prefetch_related twice, I get an exception ValueError: 'absences' lookup was already seen with a different queryset. You may need to adjust the ordering of your lookups..
I've tried using to_attr, but it looks like I can't access it in an un-evaluated queryset.
I know that I can find the first defined Prefetch in the _prefetch_related_lookups attribute of queryset, but is there any way to apply an additional filter to it or replace it with another Prefetch object so that I can end up with a query similar to:
queryset.prefetch_related(
Prefetch(
'absences',
Absence.objects.filter(
Q(Q(start_date__gte=from_) | Q(start_date__lte=from_, end_date__gte=from_))
& Q(Q(end_date__lte=to) | Q(start_date__lte=to, end_date__gte=to))
),
)
)
django-filter seems to have its own built-in filter for range queries:
More info here and here
So probably just easier to use that instead:
def filter_date_range(self, queryset, name, value):
if self.lookup_expr = "range":
#return queryset with specific prefetch
if self.lookup_expr = "lte":
#return queryset with specific prefetch
if self.lookup_expr = "gte":
#return queryset with specific prefetch
I haven't tested this and you may need to play around with the unpacking of value but it should get you most of the way there.
I'm overriding Django Admin's list_filter (to customize the filter that shows on the right side on the django admin UI for a listview). The following code works, but isn't optimized: it increases SQL queries by "number of product categories".
(The parts to focus on, in the following code sample are, qs.values_list('product_category', flat=True) which only returns an id (int), so I've to use ProductCategory.objects.get(id=i).)
Wondering if this can be simplified?
(E.g. data: Suppose the product categories are "baked" "fried" "raw" etc., and the Items are "bread" "fish fry" "cake". So when the Item list is displayed in Django Admin, all product categories will show on the 'Filter By' column on the right side of the UI.)
from django.utils.translation import ugettext_lazy as _
from django.contrib.admin import SimpleListFilter
from product_category.model import ProductCategory
class ProductCategoryFilter(SimpleListFilter):
title = _('ProductCategory')
parameter_name = 'product_category'
def lookups(self, request, model_admin):
qs = model_admin.get_queryset(request)
ordered_filter_obj_list = []
# TODO: Works, but increases SQL queries by "number of product categories"
for i in (
qs.values_list("product_category", flat=True)
.distinct()
.order_by("product_category")
):
cat = ProductCategory.objects.get(id=i)
ordered_filter_obj_list.append((i, cat))
return ordered_filter_obj_list
def queryset(self, request, queryset):
if self.value():
return queryset.filter(product_category__exact=self.value())
# P.S. Above filter is used in another class like so
class ItemAdmin(admin.ModelAdmin):
list_filter = (ProductCategoryFilter,)
Probably you are looking for select_related, I do not know your exact models structure, but you may use it as follow:
cats = set()
for p in Product.objects.all().select_related('category'):
# Without select_related(), this would make a database query for each
# loop iteration in order to fetch the related categories for each product.
cats.add(p.category)
I am Assuming there is some relation between your Product and ProductCategory models. Hope this help.
Hah, phrasing the question makes it clear in your own head! Found an answer mins after posting this:
(Instead of doing an objects.get() inside the for loop, we can do objects.all() (which is a single SQL Query) and fill up a temporary dictionary. Then use this temp dict to find the associated string value.)
def lookups(self, request, model_admin):
qs = model_admin.get_queryset(request)
category_list = {}
for x in ProductCategory.objects.all():
category_list[x.id] = str(x)
ordered_filter_obj_list = []
for i in (
qs.values_list("product_category", flat=True)
.distinct().order_by("product_category")
):
ordered_filter_obj_list.append((i, category_list[i]))
return ordered_filter_obj_list
First parameter on the tuple list is the value of the lookup, and the second is just the name for display. This can be done in a single SQL query, or via the Django ORM:
def lookups(self, request, model_admin):
qs = model_admin.get_queryset(request).select_related('product_category')
values = qs.values('product_category_id', 'product_category__name') #assuming ProductCategory has an attribute 'name'
unique_categories = values.distinct('product_category_id', 'product_category__name')
categories = []
for c in unique_categories:
categories.append((c['product_category_id'], c['product_category__name']))
return categories
I'm working on a django project with the following models.
class User(models.Model):
pass
class Item(models.Model):
user = models.ForeignKey(User)
item_id = models.IntegerField()
There are about 10 million items and 100 thousand users.
My goal is to override the default admin search that takes forever and
return all the matching users that own "all" of the specified item ids within a reasonable timeframe.
These are a couple of the tests I use to better illustrate my criteria.
class TestSearch(TestCase):
def search(self, searchterm):
"""A tuple is returned with the first element as the queryset"""
return do_admin_search(User.objects.all())
def test_return_matching_users(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
Item.objects.create(item_id=67890, user=user)
result = self.search('12345 67890')
assert_equal(1, result[0].count())
assert_equal(user, result[0][0])
def test_exclude_users_that_do_not_match_1(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
result = self.search('12345 67890')
assert_false(result[0].exists())
def test_exclude_users_that_do_not_match_2(self):
user = User.objects.create()
result = self.search('12345 67890')
assert_false(result[0].exists())
The following snippet is my best attempt using annotate that takes over 50 seconds.
def search_by_item_ids(queryset, item_ids):
params = {}
for i in item_ids:
cond = Case(When(item__item_id=i, then=True), output_field=BooleanField())
params['has_' + str(i)] = cond
queryset = queryset.annotate(**params)
params = {}
for i in item_ids:
params['has_' + str(i)] = True
queryset = queryset.filter(**params)
return queryset
Is there anything I can do to speed it up?
Here's some quick suggestions that should improve performance drastically.
Use prefetch_related` on the initial queryset to get related items
queryset = User.objects.filter(...).prefetch_related('user_set')
Filter with the __in operator instead of looping through a list of IDs
def search_by_item_ids(queryset, item_ids):
return queryset.filter(item__item_id__in=item_ids)
Don't annotate if it's already a condition of the query
Since you know that this queryset only consists of records with ids in the item_ids list, no need to write that per object.
Putting it all together
You can speed up what you are doing drastically just by calling -
queryset = User.objects.filter(
item__item_id__in=item_ids
).prefetch_related('user_set')
with only 2 db hits for the full query.
I would like to augment one of my model admins with an interesting value. Given a model like this:
class Participant(models.Model):
pass
class Registration(models.Model):
participant = models.ForeignKey(Participant)
is_going = models.BooleanField(verbose_name='Is going')
Now, I would like to show the number of other Registrations for this Participant where is_going is False. So, something akin to this SQL query:
SELECT reg.*, COUNT(past.id) AS not_going_num
FROM registrations AS reg, registrations AS past
WHERE past.participant_id = reg.participant_id AND
past.is_going = False
I think I can extend the Admin's queryset() method according to Django Admin, Show Aggregate Values From Related Model, by annotating it with the extra Count, but I still cannot figure out how to work the self-join and filter into this.
I looked at Self join with django ORM and Django self join , How to convert this query to ORM query, but the former is doing SELECT * AND the latter seems to have data model problems.
Any suggestions on how to solve this?
See edit history for previous version of the answer.
The admin implementation below will display "Not Going Count" for each Registration model. The "Not Going Count" is the count of is_going=False for the registration's participant.
#admin.register(Registration)
class RegistrationAdmin(admin.ModelAdmin):
list_display = ['id', 'participant', 'is_going', 'ng_count']
def ng_count(self, obj):
return obj.not_going_count
ng_count.short_description = 'Not Going Count'
def get_queryset(self, request):
qs = super(RegistrationAdmin, self).get_queryset(request)
qs = qs.filter(participant__registration__isnull=False)
qs = qs.annotate(not_going_count=Sum(
Case(
When(participant__registration__is_going=False, then=1),
default=0,
output_field=models.IntegerField())
))
return qs
Below is a more thorough explanation of the QuerySet:
qs = qs.filter(participant__registration__isnull=False)
The filter causes Django to perform two joins - an INNER JOIN to participant table, and a LEFT OUTER JOIN to registration table.
qs = qs.annotate(not_going_count=Sum(
Case(
When(participant__registration__is_going=False, then=1),
default=0,
output_field=models.IntegerField())
)
))
This is a standard aggregate, which will be used to SUM up the count of is_going=False. This translates into the SQL
SUM(CASE WHEN past."is_going" = False THEN 1 ELSE 0 END)
The sum is generated for each registration model, and the sum belongs to the registration's participant.
I might misunderstood, but you can do for single participant:
participant = Participant.objects.get(id=1)
not_going_count = Registration.objects.filter(participant=participant,
is_going=False).count()
For all participants,
from django.db.models import Count
Registration.objects.filter(is_going=False).values('participant') \
.annotate(not_going_num=Count('participant'))
Django doc about aggregating for each item in a queryset.
I am working with django-rest-framework and I have an API that returns me the info with a filter like this:
http://example.com/api/products?category=clothing&in_stock=True
--this returns me 10 items
But it also returns the whole Model data if I dont put the filters, this is the default way.
http://example.com/api/products/
--this returns me more than 100 (all the Model Table)
How can I disable this default operation, I mean, how can I make a filter to be necesary to make this api works? or even better! how can I make the last URL to return an empty json response?
UPDATE
Here is some code:
serializers.py
class OEntradaDetalleSerializer(serializers.HyperlinkedModelSerializer):
item = serializers.RelatedField(source='producto.item')
descripcion = serializers.RelatedField(source='producto.descripcion')
unidad = serializers.RelatedField(source='producto.unidad')
class Meta:
model = OEntradaDetalle
fields = ('url','item','descripcion','unidad','cantidad_ordenada','cantidad_recibida','epc')
views.py
class OEntradaDetalleViewSet(BulkUpdateModelMixin,viewsets.ModelViewSet):
filter_backends = (filters.DjangoFilterBackend,)
filter_fields = ('cantidad_ordenada','cantidad_recibida','oentrada__codigo_proveedor','oentrada__folio')
queryset = OEntradaDetalle.objects.all()
serializer_class = OEntradaDetalleSerializer
urls.py
router2 = BulkUpdateRouter()
router2.register(r'oentradadetalle', OEntradaDetalleViewSet)
urlpatterns = patterns('',
url(r'^api/',include(router2.urls)),
)
URL EXAMPLE
http://localhost:8000/api/oentradadetalle/?oentrada__folio=E01
THIS RETURNS ONLY SOME FILTERED VALUES
http://localhost:8000/api/oentradadetalle/
THIS RETURNS EVERYTHING IN THE MODEL (I need to remove this or make it return some empty data)
I would highly recommend using pagination, to prevent anyone from being able to return all of the results (which likely takes a while).
If you can spare the extra queries being made, you can always check if the filtered and unfiltered querysets match, and just return an empty queryset if that is the case. This would be done in the filter_queryset method on your view.
def filter_queryset(self, queryset):
filtered_queryset = super(ViewSet, self).filter_queryset(queryset)
if queryset.count() === len(filtered_queryset):
return queryset.model.objects.none()
return filtered_queryset
This will make one additional query for the count of the original queryset, and if it is the same as the filtered queryset, an empty queryset will be returned. If the queryset was actually filtered, it will be returned and the results will be what you are expecting.