Word count query in Django

Word count query in Django - django

Given a model with both Boolean and TextField fields, I want to do a query that finds records that match some criteria AND have more than "n" words in the TextField. Is this possible? e..g.:
class Item(models.Model):
...
notes = models.TextField(blank=True,)
has_media = models.BooleanField(default=False)
completed = models.BooleanField(default=False)
...
This is easy:
items = Item.objects.filter(completed=True,has_media=True)
but how can I filter for a subset of those records where the "notes" field has more than, say, 25 words?

Try this:
Item.objects.extra(where=["LENGTH(notes) - LENGTH(REPLACE(notes, ' ', ''))+1 > %s"], params=[25])
This code uses Django's extra queryset method to add a custom WHERE clause. The calculation in the WHERE clause basically counts the occurances of the "space" character, assuming that all words are prefixed by exactly one space character. Adding one to the result accounts for the first word.
Of course, this calculation is only an approximation to the real word count, so if it has to be precise, I'd do the word count in Python.

I dont know what SQL need to be run in order for the DB to do the work, which is really what we want, but you can monkey-patch it.
Make an extra fields named wordcount or something, then extend the save method and make it count all the words in notes before saving the model.
The it is trivial to loop over and there is still no chance that this denormalization of data will break since the save method is always run on save.
But there might be a better way, but if all else fails, this is what I would do.

Related

Django ORM: filtering on concatinated fields

In my app, I have a document number which consists of several fields of Document model like:
{{doc_code}}{{doc_num}}-{{doc_year}}
doc_num is an integer in the model, but for the user, it is a five digits string, where empty spaces are filled by zero, like 00024, or 00573.
doc_year is a date field in the model, but in full document number, it is the two last digits of the year.
So for users, the document number is for example - TR123.00043-22.
I want to implement searching on the documents list page.
One approach is to autogenerate the full_number field from doc_code, doc_num and doc_year fields in the save method of Document model and filter on this full_number.
Anothe is to use Concat function before using of filter on query.
First by concatinate full_code field
docs = Document.annotate(full_code=Concat('doc_code', 'doc_num', Value('-'), 'doc_year', output_field=CharField()))
and than filter by full_code field
docs = docs.filter(full_code__icontain=keyword)
But how to pass doc_num as five digits string and doc_year as two last digits of year to Concat function?
Or what could be a better solution for this task?

Concat will only take field names and string values, so you don't really have many options there that I know of.
As you note, you can set an extra field on save. That's probably the best approach if you are going to be using it in multiple places.
The save function would look something ike
def save(self, *args, **kwargs):
super().save()
self.full_code = str(self.doc_code) + f"{doc_num:05d}") + '-' + time.strftime("%y", doc_year))
self.save()
doc_num requires python>= 3.6, other methods for earlier pythons can be seen here
doc_year assumes it is a datetime type. If it is just a four digit int then something like str(doc_year)[-2:] should work instead.
Alternately, if you are only ever going to use it rarely you could loop through your recordset adding an additional field
docs=Document.objects.all() #or whatever filter is appropriate
for doc in docs:
doc.full_code = f"{doc.doc_code}{doc.doc_num}-{time.strftime("%y", doc_year)}
#or f"{doc.doc_code}{doc.doc_num}-{str(doc_year)[-2:]} if doc_year not datetime
and then convert it to a list so you don't make another DB call and lose your new field, and filter it via list comprehension.
filtered_docs = [x for x in list(docs) if search_term in x.full_code]
pass filtered_docs to your template and away you go.

Annotate one part of a range to a new field

So we've been using a DateTimeRangeField in a booking model to denote start and end. The rationale for this might not have been great —separate start and end fields might have been better in hindsight— but we're over a year into this now and there's no going back.
It's generally been fine except I need to annotate just the end datetime onto a related model's query. And I can't work out the syntax.
Here's a little toy example where I want a list of Employees with end of their last booking annotated on.
class Booking(models.Model):
timeframe = DateTimeRangeField()
employee = models.ForeignKey('Employee')
sq = Booking.objects.filter(employee=OuterRef('pk')).values('timeframe')
Employee.objects.annotate(last_on_site=Subquery(sq, output_field=DateTimeField()))
That doesn't work because the annotated value is the range, not the single value. I've tried a heap of modifiers (egs __1 .1 but nothing works).
Is there a way to get just the one value? I guess you could simulate this without the complication of the subquery just doing a simple values lookup. Booking.objects.values('timeframe__start') (or whatever). That's essentially what I'm trying to do here.

Thanks to some help in IRC, it turns out you can use the RangeStartsWith and RangeEndsWith model transform classes directly. These are the things that are normally just registered to provide you with a __startswith filter access to range values, but directly they can pull back the value.
In my example, that means just modifying the annotation slightly:
from django.contrib.postgres.fields.ranges import RangeEndsWith
sq = Booking.objects.filter(employee=OuterRef('pk')).values('timeframe')
Employee.objects.annotate(last_on_site=RangeEndsWith(Subquery(sq[:1])))

Filter multiple Django model fields with variable number of arguments

I'm implementing search functionality with an option of looking for a record by matching multiple tables and multiple fields in these tables.
Say I want to find a Customer by his/her first or last name, or by ID of placed Order which is stored in different model than Customer.
The easy scenario which I already implemented is that a user only types single word into search field, I then use Django Q to query Order model using direct field reference or related_query_name reference like:
result = Order.objects.filter(
Q(customer__first_name__icontains=user_input)
|Q(customer__last_name__icontains=user_input)
|Q(order_id__icontains=user_input)
).distinct()
Piece of a cake, no problems at all.
But what if user wants to narrow the search and types multiple words into search field.
Example: user has typed Bruce and got a whole lot of records back as a result of search.
Now he/she wants to be more specific and adds customer's last name to search.So the search becomes Bruce Wayne, after splitting this into separate parts I'm having Bruce and Wayne. Obviously I don't want to search Orders model because order_id is a single-word instance and it's sufficient to find customer at once so for this case I'm dropping it out of query at all.
Now I'm trying to match customer by both first AND last name, I also want to handle the scenario where the order of provided data is random, to properly handle Bruce Wayne and Wayne Bruce, meaning I still have customers full name but the position of first and last name aren't fixed.
And this is the question I'm looking answer for: how to build query that will search multiple fields of model not knowing which of search words belongs to which table.
I'm guessing the solution is trivial and there's for sure an elegant way to create such a dynamic query, but I can't think of a way how.

You can dynamically OR a variable number of Q objects together to achieve your desired search. The approach below makes it trivial to add or remove fields you want to include in the search.
from functools import reduce
from operator import or_
fields = (
'customer__first_name__icontains',
'customer__last_name__icontains',
'order_id__icontains'
)
parts = []
terms = ["Bruce", "Wayne"] # produce this from your search input field
for term in terms:
for field in fields:
parts.append(Q(**{field: term}))
query = reduce(or_, parts)
result = Order.objects.filter(query).distinct()
The use of reduce combines the Q objects by ORing them together. Credit to that part of the answer goes to this answer.

The solution I came up with is rather complex, but it works exactly the way I wanted to handle this problem:
search_keys = user_input.split()
if len(search_keys) > 1:
first_name_set = set()
last_name_set = set()
for key in search_keys:
first_name_set.add(Q(customer__first_name__icontains=key))
last_name_set.add(Q(customer__last_name__icontains=key))
query = reduce(and_, [reduce(or_, first_name_set), reduce(or_, last_name_set)])
else:
search_fields = [
Q(customer__first_name__icontains=user_input),
Q(customer__last_name__icontains=user_input),
Q(order_id__icontains=user_input),
]
query = reduce(or_, search_fields)
result = Order.objects.filter(query).distinct()

Filtering on the concatenation of two model fields in django

With the following Django model:
class Item(models.Model):
name = CharField(max_len=256)
description = TextField()
I need to formulate a filter method that takes a list of n words (word_list) and returns the queryset of Items where each word in word_list can be found, either in the name or the description.
To do this with a single field is straightforward enough. Using the reduce technique described here (this could also be done with a for loop), this looks like:
q = reduce(operator.and_, (Q(description__contains=word) for word in word_list))
Item.objects.filter(q)
I want to do the same thing but take into account that each word can appear either in the name or the description. I basically want to query the concatenation of the two fields, for each word. Can this be done?
I have read that there is a concatenation operator in Postgresql, || but I am not sure if this can be utilized somehow in django to achieve this end.
As a last resort, I can create a third column that contains the combination of the two fields and maintain it via post_save signal handlers and/or save method overrides, but I'm wondering whether I can do this on the fly without maintaining this type of "search index" type of column.

The most straightforward way would be to use Q to do an OR:
lookups = [Q(name__contains=word) | Q(description__contains=word)
for word in words]
Item.objects.filter(*lookups) # the same as and'ing them together
I can't speak to the performance of this solution as compared to your other two options (raw SQL concatenation or denormalization), but it's definitely simpler.

django filter depeding on character count of the related model's fields

I have two models such that
class Employer(models.Model):
code = models.CharField(null=False,blank=False,default="")
class JobTitle(models.Model):
employer = models.ForeignKey(Employer,unique=False,null=False,default=0)
name = models.CharField(max_length=1000,null=False,default="")
and I would like to get all employers whose jobtitle name is less than X chracters. How can I achive this in Django ?
Thanks

The correct code for this is
Employer.objects.filter(jobtitle__name__regex="^.{0,20}$")
This will select all the employers who have a job title name up to and including 20 characters long. Just replace the 20 with whatever number you need.
Note that if an Employer has multiple JobTitles whose name are less than 20 characters long, it will return that Employer in the list multiple times. If you don't want this to happen, you should add distinct() to the query as follows:
Employer.objects.filter(jobtitle__name__regex="^.{0,20}$").distinct()
You'll now only get the Employer back once, even if they have multiple short JobTitles.

try
Employer.objects.filter(jobtitle__name__regex="^\W{0, X}$")
When you use regex, you can filter records from database with provided regular expression. In this case all records with name which contains 0 to X signs will be returned

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js