I have a app that lets the user search a database of +/- 100,000 documents for keywords / sentences.
I am using Django 1.11 and the Postgres FullTextSearch features described in the documentation
However, I am running into the following problem and I was wondering if someone knows a solution:
I want to create a SearchQuery object for each word in the supplied queryset like so:
query typed in by the user in the input field: ['term1' , 'term2', 'term3']
query = SearchQuery('term1') | SearchQuery('term2') | SearchQuery('term3')
vector = SearchVector('text')
Document.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank').annotate(similarity=TrigramSimilarity(vector, query).filter(simularity__gt=0.3).order_by('-simularity')
The problem is that I used 3 terms for my query in the example, but I want that number to be dynamic. A user could also supply 1, or 10 terms, but I do not know how to add the relevant code to the query assignment.
I briefly thought about having the program write something like this to an empty document:
for query in terms:
file.write(' | (SearchQuery( %s )' % query ))
But having a python program writing python code seems like a very convoluted solution. Does anyone know a better way to achieve this?
Ive never used it, but to do a dynamic query you can just loop and add.
compound_statement = SearchQuery(list_of_words[0])
for term in list_of_words[1:]:
compound_statement = compound_statement | SearchQuery(term)
But the documentation tells us that
By default, all the words the user provides are passed through the stemming algorithms, and then it looks for matches for all of the resulting terms.
are you sure you need this?
Related
So I'm trying to find a nice way to execute an advanced filter using the LIKE statement in Django.
Let's say I have the following records in a table called elements:
id = 1, name = 'group[1].car[8]'
id = 2, name = 'group[1].car[9]'
id = 3, name = 'group[1].truck[1]'
id = 4, name = 'group[1].car[10]'
id = 4, name = 'group[1].carVendor[1]'
I would like to select all elements that look like group[x].car[y].
To query this in SQL I would do:
SELECT * FROM elements WHERE name LIKE 'group[%].car[%]'
Now, by reading the Django documentation here, I see that the only pre-built LIKE statements are the following:
contains: name LIKE '%something%'
startswith: name LIKE 'something%'
endswith: name LIKE '%something'
So the one I need is missing:
plain like: name LIKE 'group[%].car[%]'
I'm also using Django Rest Framework to write up my API endpoints and also here we find the possibility to use:
contains: name__contains = something
startswith: name__startswith = something
endswith: name__endswith = something
So also here, the one I need is missing:
plain like: name__like 'group[%].car[%]'
Of course I know I can write a raw sql query through Django using the raw() method, but I would like to use this option if no better solution comes up, because:
I need to make sure my customization is safe
I need to extends the customization to DRF
Can anybody think of a way to help me out with this in a way to go with the flow with both Django and Django Rest Framework?
You can use a regular expression (regex) [wiki] for this, with the __iregex lookup [Django-doc]:
Elements.objects.filter(name__iregex=r'^group\[.*\].car\[.*\]$')
if between the square brackets, only digits are allowed, we can make it more specific with:
# only digits between the square brackets
Elements.objects.filter(name__iregex=r'^group\[\d*\].car\[\d*\]$')
Since some the specifications are a bit "complex" it is better to first test your regex, for example with regex101 you can look what names will be matched, and which will not.
I have two models, one with ForeignKey I'm trying to match. In order to do so, I'm looking up second model by a specific number and a date. The problem is it has two dates and I have to make decision on which date to choose. Under some circumstances it is set to NULL, in some it is not. If it is I have to get the second date field. I have something like this:
class MyModel1(models.Model):
model2_key = models.ForeignKey(MyModel2)
model1_date=...
model1_number=...
second model:
class MyModel2(models.Model):
model2_date1=...
model2_date2=...
model2_number=...
Now, how to make the choice? I have looked up documentation regarding F expressions, Q expressions, When expressions, Select and I'm a little bit confused. How can I wrtie a function that returns searched MyModel2 object? I have something like this, but it won't work.
def _find_model2(searched_date, searched_number):
searched_model2=MyModel2.objects.get(Q(model2_number=searched_number),
Q(When(model2_date1__e='NULL', then=model2_date2) |
Q(When(model2_date1__ne='NULL', then=model2_date1))=searched_date))
I am quite new to django, so any help will be appreciated.
I have made a workaround this issue, I don't think it's mostefficient and elegant one, but it works. Should anyone have a better solution please post it.
First, all objects whose corresponding dates match are called:
from_query = list(MyModel2.objects.filter(Q(model2_date1__range=
(datetime_min, datetime_max)) | Q(model2_date2__range=(datetime_min, datetime_max)),
model2_number=searched_number))
Then I iterate over found objects:
to_return = []
for item in from_query:
if item.model2_date1:
to_return.append(item)
elif datetime_min <= item.model2_date2 <= datetime_max:
to_return.append(item)
EDIT: I've come up with a solution. Assuring that model2_date1__isnull=True is enough. The solution now looks like this:
from_query = list(MyModel2.objects.get((Q(model2_date1__range=(datetime_min, datetime_max)) |
Q(Q(model2_date2__range=(datetime_min, datetime_max)),
Q(model2_date1__isnull=True)),
model2_number=searched_number))
I've got a working SQL query that I'm trying to write in Django (without resorting to RAW) and was hoping you might be able to help.
Broadly, I'm looking to next two queries - the first calculates a COUNT, and then I'm looking to calculate an AVERAGE of the COUNTS. (this'll give you the average number of items on a ticket, per location)
The SQL that works is:
SELECT location_name, Avg(subq.num_tickets) FROM (
SELECT Count(ticketitem.id) AS num_tickets, location.name AS location_name
FROM ticketitem
JOIN ticket ON ticket.id = ticketitem.ticket_id
JOIN location ON location.id = ticket.location_id
JOIN location ON location.id = location.app_location_id
GROUP BY ticket_id, location.name) AS subq
GROUP BY subq.location_name;
For my Django code, I'm trying something like this:
# Get the first count
qs = TicketItem.objects.filter(<my complicated filter>).\
values('ticket__location__app_location__name','posticket').\
annotate(num_tickets=Count('id'))
# now get the average of the count
qs2 = qs.values('ticket__location__app_location__name').\
annotate(Avg('num_tickets')).\
order_by('location__app_location__name')
but that fails because num_tickets doesn't exist ... Anyway - suspect I'm being slow. Would love someone to enlighten me!
Check out the section on aggregating annotations from the Django docs. Their example takes an average of a count.
I was playing around with this a bit in a manage.py shell, and I think the django ORM might not be able to do that kind of annotation. Honestly you're probably going to have to resort to doing a raw query or bind in something like https://github.com/Deepwalker/aldjemy which would let you do that via SQLAlchemy.
When I playing with this I tried
(my_model.objects.filter(...)
.values('parent_id', 'parent__name', 'thing')
.annotate(Count('thing'))
.values('name', 'thing__count')
.annotate(Avg('thing__count')))
Which gave a lovely traceback about FieldError: Cannot compute Avg('thing__count'): 'thing__count' is an aggregate, which makes sense since I doubt the ORM is trying to convert that first group by to a nested query.
I have two models, Sample and Run. A Sample can belong to multiple Runs. The Run model has name that I would like to use to filter Samples on; I would like to find all Samples that have a run with a given name filter. In SqlAlchemy, I write this like:
Sample.query.filter(Sample.runs.any(Run.name.like('%test%'))).all()
In Django, I start with:
Sample.objects.filter(run__in=Run.objects.filter(name__icontains='test'))
or
Sample.objects.filter(run__name__icontains='test')
However both of these produce duplicates so I must add .distinct() to the end.
The Django approach of using distinct has terrible performance when there are a large number of predicates (because the distinct operation must runs over a large number of possible rows) whereas the SqlAlchemy runs fine. The repeated rows come from repeated left outer join from each predicate.
For example:
Sample.objects.filter(Q(**{'run__name__icontains': 'alex'}) |
Q(**{'run__name__icontains': 'baz'}) | ...)
EDIT: To make this a little more complicated, I do want the ability to have filters like:
(Q(**{'run__name__icontains': 'alex'}) | Q(**{'name__icontains': 'alex'})
& Q(**{'run__name__icontains': 'baz'}) | Q(**{'name__icontains': 'baz'}))
which has a SQLAlchemy query like:
clause1 = Sample.runs.any(Run.name.like('%alex%')) | Sample.name.like('%test%')
clause2 = Sample.runs.any(Run.name.like('%baz%')) | Sample.name.like('%baz%')
Sample.query.filter(clause1 & clause2)
Assuming this is your models.py:
from django.db import models
class Sample(models.Model):
name = models.CharField(max_length=255)
class Run(models.Model):
name = models.CharField(max_length=255)
sample = models.ForeignKey(Sample)
Since I wasn't able to figure out how to do this without using "distinct", or without using "raw" (in which, if you're forming your own SQL code, and can't rely on the ORM, then what's the point :p), I recommend to try replacing the Django ORM with SQLAlchemy, or use them along-side each other, since theoretically that would work. Sorry I couldn't be of much help :(
Here is a fairly-recent blog post that can help you do that:
http://rodic.fr/blog/sqlalchemy-django/
Is it possible to filter within an annotation?
In my mind something like this (which doesn't actually work)
Student.objects.all().annotate(Count('attendance').filter(type="Excused"))
The resultant table would have every student with the number of excused absences. Looking through documentation filters can only be before or after the annotation which would not yield the desired results.
A workaround is this
for student in Student.objects.all():
student.num_excused_absence = Attendance.objects.filter(student=student, type="Excused").count()
This works but does many queries, in a real application this can get impractically long. I think this type of statement is possible in SQL but would prefer to stay with ORM if possible. I even tried making two separate queries (one for all students, another to get the total) and combined them with |. The combination changed the total :(
Some thoughts after reading answers and comments
I solved the attendance problem using extra sql here.
Timmy's blog post was useful. My answer is based off of it.
hash1baby's answer works but seems equally complex as sql. It also requires executing sql then adding the result in a for loop. This is bad for me because I'm stacking lots of these filtering queries together. My solution builds up a big queryset with lots of filters and extra and executes it all at once.
If performance is no issue - I suggest the for loop work around. It's by far the easiest to understand.
As of Django 1.8 you can do this directly in the ORM:
students = Student.objects.all().annotate(num_excused_absences=models.Sum(
models.Case(
models.When(absence__type='Excused', then=1),
default=0,
output_field=models.IntegerField()
)))
Answer adapted from another SO question on the same topic
I haven't tested the sample above but did accomplish something similar in my own app.
You are correct - django does not allow you to filter the related objects being counted, without also applying the filter to the primary objects, and therefore excluding those primary objects with a no related objects after filtering.
But, in a bit of abstraction leakage, you can count groups by using a values query.
So, I collect the absences in a dictionary, and use that in a loop. Something like this:
# a query for students
students = Students.objects.all()
# a query to count the student attendances, grouped by type.
attendance_counts = Attendence(student__in=students).values('student', 'type').annotate(abs=Count('pk'))
# regroup that into a dictionary {student -> { type -> count }}
from itertools import groupby
attendance_s_t = dict((s, (dict(t, c) for (s, t, c) in g)) for s, g in groupby(attendance_counts, lambda (s, t, c): s))
# then use them efficiently:
for student in students:
student.absences = attendance_s_t.get(student.pk, {}).get('Excused', 0)
Maybe this will work for you:
excused = Student.objects.filter(attendance__type='Excused').annotate(abs=Count('attendance'))
You need to filter the Students you're looking for first to just those with excused absences and then annotate the count of them.
Here's a link to the Django Aggregation Docs where it discusses filtering order.