Search a column for multiple words using Django queryset - django

I have an autocomplete box which needs to return results having input words. However the input words can be partial and located in different order or places.
Example:
The values in database column (MySQL)-
Expected quarterly sales
Sales preceding quarter
Profit preceding quarter
Sales 12 months
Now if user types quarter sales then it should return both of the first two results.
I tried:
column__icontains = term #searches only '%quarter sales% and thus gives no results
column__search = term #searches any of complete words and also returns last result
**{'ratio_name__icontains':each_term for each_term in term.split()} #this searches only sales as it is the last keyword argument
Any trick via regex or may be something I am missing inbuilt in Django since this is a common pattern?

Search engines are better for this task, but you can still do it with basic code.
If you're looking for strings containing "A" and "B", you can
Model.objects.filter(string__contains='A').filter(string__contains='B')
or
Model.objects.filter(Q(string__contains='A') & Q(string__contains='B'))
But really, you'd be better going with a simple full text search engine with little configuration, like Haystack/Whoosh

The above answers using chained .filter() require entries to match ALL the filters.
For those wanting "inclusive or" or ANY type behaviour, you can use functools.reduce to chain together django's Q operator for a list of search terms:
from functools import reduce
from django.db.models import Q
list_of_search_terms = ["quarter", "sales"]
query = reduce(
lambda a, b: a | b,
(Q(column__icontains=term) for term in list_of_search_terms),
)
YourModel.objects.filter(query)

Related

How do I annotate a django queryset with StringAgg or ArrayAgg concatenating one column from mulitple children rows?

Documents is the parent table.
Paragraphs is the child table.
Users filter Documents based on various search criteria.
Then I wish to annotate Documents with certain Paragraphs filtered by a text query.
The same text query is used to filter Documents and rank them (SearchRank). This ranking makes it necessary to start from Documents and annotate them with Paragraphs, instead of starting from Paragraphs and grouping them by Document.
The postgresql way of concatenating one text field from multiple rows in Paragraphs would be the following:
SELECT array_to_string(
ARRAY(
SELECT paragraph.text
FROM paragraph
WHERE document id = '...'
ORDER BY paragraph.number),
', ');
I am trying to translate this into django coding.
I have tried numerous django approaches, to no avail.
I can annotate 1 Paragraph.
Query_sum is a Q() object built from user input.
results = Documents.filter(Query_sum)
sub_paragraphs = Paragraphs.filter(Query_sum).filter(document=OuterRef('id'))
results = results.annotate(paragraphs=Subquery(sub_paragraphs.values('text')[:1], output_field=TextField()))
Problems start when I get rid of slicing [:1].
results = results.annotate(paragraphs=Subquery(sub_paragraphs.values('text'), output_field=TextField()))
I then get the following error:
"more than one row returned by a subquery used as an expression".
To fix that, I tried to use ArrayAgg and StringAgg.
I made quite a mess ;-)
The Documents queryset (result) should be annotated either with a list of relevant Paragraphs (ArrayAgg), or a string of Paragraphs separated by any delimiter (StringAgg).
Any idea of how to proceed? I would be extremely grateful
We can annotate and order the documents with the number of paragraphs it has that match the query by using annotate with Sum, Case and When
documents = Document.objects.annotate(
matches=Sum(Case(
# This could depend on the related name for the paragraph -> document relationship
When(paragraphs__text__icontains=search_string, then=Value(1)),
default=Value(0),
output_field=IntegerField(),
)))
).order_by('-matches')
Then, to get all the paragraphs that match the query for each document we an use prefetch_related. We can use a Prefetch object to filter the prefetch operation
documents = documents.prefetch_related(Prefetch(
'paragraphs',
queryset=Paragraph.objects.filter(text__icontains=search_string),
to_attrs='matching_paragraphs'
))
You can then loop over the documents in ranked order and they will have an attribute "matching_paragraphs" that contains all the matching paragraphs

Filter multiple Django model fields with variable number of arguments

I'm implementing search functionality with an option of looking for a record by matching multiple tables and multiple fields in these tables.
Say I want to find a Customer by his/her first or last name, or by ID of placed Order which is stored in different model than Customer.
The easy scenario which I already implemented is that a user only types single word into search field, I then use Django Q to query Order model using direct field reference or related_query_name reference like:
result = Order.objects.filter(
Q(customer__first_name__icontains=user_input)
|Q(customer__last_name__icontains=user_input)
|Q(order_id__icontains=user_input)
).distinct()
Piece of a cake, no problems at all.
But what if user wants to narrow the search and types multiple words into search field.
Example: user has typed Bruce and got a whole lot of records back as a result of search.
Now he/she wants to be more specific and adds customer's last name to search.So the search becomes Bruce Wayne, after splitting this into separate parts I'm having Bruce and Wayne. Obviously I don't want to search Orders model because order_id is a single-word instance and it's sufficient to find customer at once so for this case I'm dropping it out of query at all.
Now I'm trying to match customer by both first AND last name, I also want to handle the scenario where the order of provided data is random, to properly handle Bruce Wayne and Wayne Bruce, meaning I still have customers full name but the position of first and last name aren't fixed.
And this is the question I'm looking answer for: how to build query that will search multiple fields of model not knowing which of search words belongs to which table.
I'm guessing the solution is trivial and there's for sure an elegant way to create such a dynamic query, but I can't think of a way how.
You can dynamically OR a variable number of Q objects together to achieve your desired search. The approach below makes it trivial to add or remove fields you want to include in the search.
from functools import reduce
from operator import or_
fields = (
'customer__first_name__icontains',
'customer__last_name__icontains',
'order_id__icontains'
)
parts = []
terms = ["Bruce", "Wayne"] # produce this from your search input field
for term in terms:
for field in fields:
parts.append(Q(**{field: term}))
query = reduce(or_, parts)
result = Order.objects.filter(query).distinct()
The use of reduce combines the Q objects by ORing them together. Credit to that part of the answer goes to this answer.
The solution I came up with is rather complex, but it works exactly the way I wanted to handle this problem:
search_keys = user_input.split()
if len(search_keys) > 1:
first_name_set = set()
last_name_set = set()
for key in search_keys:
first_name_set.add(Q(customer__first_name__icontains=key))
last_name_set.add(Q(customer__last_name__icontains=key))
query = reduce(and_, [reduce(or_, first_name_set), reduce(or_, last_name_set)])
else:
search_fields = [
Q(customer__first_name__icontains=user_input),
Q(customer__last_name__icontains=user_input),
Q(order_id__icontains=user_input),
]
query = reduce(or_, search_fields)
result = Order.objects.filter(query).distinct()

django aggregate for multiple days

I have a model which has two attributes: date and length and others which are not relevant. And I need to display list of sums of length for each day in template.
The solution I've used so far is looping day by day and creating list of sums using aggregations like:
for day in month:
sums.append(MyModel.objects.filter(date=date).aggregate(Sum('length')))
But it seems very ineffective to me because of the number of db lookups. Isn't there a better way to do this? Like caching everything and then filter it without touching the db?
.values() can be used to group by date, so you will only get unique dates together with the sum of length fields via .annotate():
>>> from django.db.models import Sum
>>> MyModel.objects.values('date').annotate(total_length=Sum('length'))
From docs:
When .values() clause is used to constrain the columns that are returned in the result set, the method for evaluating annotations is slightly different. Instead of returning an annotated result for each result in the original QuerySet, the original results are grouped according to the unique combinations of the fields specified in the .values() clause.
Hope this helps.

using two xpathselectors on the same page

I have a spider where the scraped items are 3: brand, model and price from the same page.
Brands and models are using the same sel.xpath, later extracted and differentiated by .re in loop. However, price item is using different xpath. How can I use or combine two XPathSelectors in the spider?
Examples:
for brand and model:
titles = sel.xpath('//table[#border="0"]//td[#class="compact"]')
for prices:
prices = sel.xpath('//table[#border="0"]//td[#class="cl-price-cont"]//span[4]')
Tested and exported individually by xpath. My problem is the combining these 2 to construct the proper loop.
Any suggestions?
Thanks!
Provided you can differentiate all 3 kind of items (brand, model, price) later, you can try using XPath union (|) to bundle both XPath queries into one selector :
//table[#border="0"]//td[#class="compact"]
|
//table[#border="0"]//td[#class="cl-price-cont"]//span[4]
UPDATE :
Responding your comment, above meant to be single XPath string. I'm not using python, but I think it should be about like this :
sel.xpath('//table[#border="0"]//td[#class="compact"] | //table[#border="0"]//td[#class="cl-price-cont"]//span[4]')
I believe you are having trouble associating the price with the make/model because both xpaths give you a list of all numbers, correct? Instead, what you want to do is build an xpath that will get you each row of the table. Then, in your loop, you can do further xpath queries to pull out the make/model/price.
rows = sel.xpath('//table[#border="0"]/tr') # Get all the rows
for row in rows:
make_model = row.xpath('//td[#class="compact"]/text()').extract()
# set make and model here using your regex. something like:
(make,model) = re("^(.+?)\s(.+?)$", make_model).groups()
price = row.xpath('//td[#class="cl-price-cont"]//span[4]/text()').extract()
# do something with the make/model/price.
This way, you know that in each iteration of the loop, the make/model/price you're getting all go together.

Django annotation with nested filter

Is it possible to filter within an annotation?
In my mind something like this (which doesn't actually work)
Student.objects.all().annotate(Count('attendance').filter(type="Excused"))
The resultant table would have every student with the number of excused absences. Looking through documentation filters can only be before or after the annotation which would not yield the desired results.
A workaround is this
for student in Student.objects.all():
student.num_excused_absence = Attendance.objects.filter(student=student, type="Excused").count()
This works but does many queries, in a real application this can get impractically long. I think this type of statement is possible in SQL but would prefer to stay with ORM if possible. I even tried making two separate queries (one for all students, another to get the total) and combined them with |. The combination changed the total :(
Some thoughts after reading answers and comments
I solved the attendance problem using extra sql here.
Timmy's blog post was useful. My answer is based off of it.
hash1baby's answer works but seems equally complex as sql. It also requires executing sql then adding the result in a for loop. This is bad for me because I'm stacking lots of these filtering queries together. My solution builds up a big queryset with lots of filters and extra and executes it all at once.
If performance is no issue - I suggest the for loop work around. It's by far the easiest to understand.
As of Django 1.8 you can do this directly in the ORM:
students = Student.objects.all().annotate(num_excused_absences=models.Sum(
models.Case(
models.When(absence__type='Excused', then=1),
default=0,
output_field=models.IntegerField()
)))
Answer adapted from another SO question on the same topic
I haven't tested the sample above but did accomplish something similar in my own app.
You are correct - django does not allow you to filter the related objects being counted, without also applying the filter to the primary objects, and therefore excluding those primary objects with a no related objects after filtering.
But, in a bit of abstraction leakage, you can count groups by using a values query.
So, I collect the absences in a dictionary, and use that in a loop. Something like this:
# a query for students
students = Students.objects.all()
# a query to count the student attendances, grouped by type.
attendance_counts = Attendence(student__in=students).values('student', 'type').annotate(abs=Count('pk'))
# regroup that into a dictionary {student -> { type -> count }}
from itertools import groupby
attendance_s_t = dict((s, (dict(t, c) for (s, t, c) in g)) for s, g in groupby(attendance_counts, lambda (s, t, c): s))
# then use them efficiently:
for student in students:
student.absences = attendance_s_t.get(student.pk, {}).get('Excused', 0)
Maybe this will work for you:
excused = Student.objects.filter(attendance__type='Excused').annotate(abs=Count('attendance'))
You need to filter the Students you're looking for first to just those with excused absences and then annotate the count of them.
Here's a link to the Django Aggregation Docs where it discusses filtering order.