Django custom field - automatically add COLLATE to query - django

I'm trying to create a custom field which would automatically add COLLATE information into the WHERE part of SQL query:
class IgnoreDiacriticsField(models.TextField):
def get_prep_lookup(self, lookup_type, value):
if lookup_type == 'exact':
return ' "' + self.get_prep_value(value) + '" COLLATE utf8_general_ci'
when I perform a query like this:
result = ModelClass.objects.filter(field='value')
then nothing is found, even though the query (print result.query) is valid and matches several rows. Am I doing something wrong?
The reason why I'm adding the collation iformation is that I want perform queries on those fields and ignore any diacritics.

Are you using MySQL 1.2.1p2 by any chance? From the Django documentation
If you're using MySQLdb 1.2.1p2, Django's standard CharField class
will return unicode strings even with utf8_bin collation. However,
TextField fields will be returned as an array.array instance (from
Python's standard array module). There isn't a lot Django can do about
that, since, again, the information needed to make the necessary
conversions isn't available when the data is read in from the
database. This problem was fixed in MySQLdb 1.2.2, so if you want to
use TextField with utf8_bin collation, upgrading to version 1.2.2 and
then dealing with the bytestrings (which shouldn't be too difficult)
as described above is the recommended solution.

Related

Looking up value in JSONField with unaccent and icontains

I have a Model with a JSONField:
class MyModel(models.Model):
locale_names = models.JSONField()
The shape of the JSON Field is simple: keys are language codes (en, fr...) and values are translated strings.
I'm trying to build a search query that does an unaccented icontains search on a translated value:
MyModel.objects.filter(locale_names__en__unaccent__icontains="Test")
This does not give the expected results, because Django interprets "unaccent" as a key to look up in the JSON, rather than the unaccent PostgreSQL function:
-- Expected SQL query: something like
SELECT "app_model"."*" ...
FROM "app_model"
WHERE UPPER(UNACCENT("app_model"."locale_names" ->>'en')::text)) LIKE UPPER(UNACCENT('%Test%'))
LIMIT 21
-- Actual SQL query
SELECT "app_model"."*" ...
FROM "app_model"
WHERE UPPER(("app_model"."locale_names" #>> ARRAY['en','unaccent'])::text) LIKE UPPER('%Test%')
LIMIT 21
How can I tel Django to interpret __unaccent as the PostgreSQL function rather than a JSON path?
EDIT:
I'm using Django 3.2
Doing __unaccent__icontains lookups on regular CharFields works as expected.
Unfortunately, JSONField does not support unaccent lookup.
cf. documentation :
The unaccent lookup can be used on CharField and TextField:
As a complement to #Benbb96's answer above, my workaround was to write the WHERE clause I needed using the soon-to-be-deprecated QuerySet.extra method:
MyModel.objects.extra(
where=[
"UPPER(UNACCENT((app_model.locale_names->>'en')::text)) LIKE UPPER(UNACCENT(%s))"
],
params=("Test",)
)
As requested by the Django team, I created a ticket with them so that this use case can be addressed without QuerySet.extra().

Convert the value of a field in a django RawQueryset to a different django field type

I have a rather complex query that's generating a Django RawQuerySet. This specific query returns some fields that aren't part of the model that the RawQuerySet is based on, so I'm using .annotate(field_name=models.Value('field_name')) to attach it as an attribute to individual records in the RawQuerySet. The most important custom field is actually a uuid, which I use to compose URLs using Django's {% url %} functionality.
Here's the problem: I'm not using standard uuids inside my app, I'm using SmallUUIDs (compressed UUIDs.) These are stored in the database as native uuidfields then converted to shorter strings in python. So I need to somehow convert the uuid returned as part of the RawQuerySet to a SmallUUID for use inside a template to generate a URL.
My code looks somewhat like this:
query = "SELECT othertable.uuid_field as my_uuid FROM myapp_mymodel
JOIN othertable o ON myapp_mymodel.x = othertable.x"
MyModel.objects.annotate(
my_uuid=models.Value('my_uuid'),
).raw(query)
Now there is a logical solution here, there's an optional kwarg for models.Value called output_field, making the code look like this:
MyModel.objects.annotate(
my_uuid=models.Value('my_uuid', output_field=SmallUUIDField()),
).raw(query)
But it doesn't work! That kwarg is completely ignored and the type of the attribute is based on the type returned from the database and not what's in output_field. In my case, I'm getting a uuid output because Postgres is returning a UUID type, but if I were to change the query to SELECT cast othertable.uuid_field as text) as my_uuid I'd get the attribute in the format of a string. It appears that Django (at least version 1.11.12) doesn't actually care what is in that kwarg in this instance.
So here's what I'm thinking are my potential solutions, in no particular order:
Change the way the query is formatted somehow (either in Django or in the SQL)
Change the resulting RawQuerySet in some way before it's passed to the view
Change something inside the templates to convert the UUID to a smalluuid for use in the URL reverse process.
What's my best next steps here?
A couple of issues with your current approach:
Value() isn't doing what you think it is - your annotation is literally just annotating each row with the value "my_uuid" because that is what you have passed to it. It isn't looking up the field of that name (to do that you need to use F expressions).
Point 1 above doesn't matter anyway because as soon as you use raw() then the annotation is ignored - which is why you see no effect coming from it.
Bottom line is that trying to annotate a RawQuerySet isn't going to be easy. There is a translations argument that it accepts, but I can't think of a way to get that to work with the type of join you are using.
The next best suggestion that I can think of is that you just manually convert the field into a SmallUUID object when you need it - something like this:
from smalluuid import SmallUUID
objects = MyModel.objects.raw(query)
for o in objects:
# Take the hex string obtained from the database and convert it to a SmallUUID object.
# If your database has a built-in UUID type you will need to do
# SmallUUID(small=o.my_uuid) instead.
my_uuid = SmallUUID(hex=o.my_uuid)
(I'm doing this in a loop just to illustrate - depending on where you need this you can do it in a template tag or view).

Is there a way to add "Collation" in to Django 1.3 query?

I need to make a get query like:
obj = Current.objects.get(Code='M01.C0001')
But the query giving "Multiple Objects Returned' error because of the database has another record with similar unicode string 'M01.Ç0001'
[<obj: M01.Ç0001>, <obj: M01.C0001>]
I try to fetch data with field lookup functions, but it does not work anyway.
I googled around but I didn't find a way to temporarily set the Collation for this query.
Is it possible to temporarily set collation during executing a get query in Django 1.3?
SOLUTION:
I solved my problem with using raw django query with adding COLLATE to sql string.
obj = Current.objects.raw("SELECT * FROM Current WHERE Code = 'M01.C0001' COLLATE utf8_bin;")
Collation is a database property, so you cannot do that.
Change collation to database.

Django GROUP BY including unnecessary columns?

I have Django code as follows
qs = Result.objects.only('time')
qs = qs.filter(organisation_id=1)
qs = qs.annotate(Count('id'))
And it gets translated into the following SQL:
SELECT "myapp_result"."id", "myapp_result"."time", COUNT("myapp_result"."id") AS "id__count" FROM "myapp_result" WHERE "myapp_result"."organisation_id" = 1 GROUP BY "myapp_result"."id", "myapp_result"."organisation_id", "myapp_result"."subject_id", "myapp_result"."device_id", "myapp_result"."time", "myapp_result"."tester_id", "myapp_result"."data"
As you can see, the GROUP BY clause starts with the field I intended (id) but then it goes on to list all the other fields as well. Is there any way I can persuade Django not to specify all the individual fields like this?
As you can see, even with .only('time') that doesn't stop Django from listing all the other fields anyway, but only in this GROUP BY clause.
The reason I want to do this is to avoid the issue described here where PostgreSQL doesn't support annotation when there's a JSON field involved. I don't want to drop native JSON support (so I'm not actually using django-jsonfield). The query works just fine if I manually issue it without the reference to "myapp_result"."data" (the only JSON field on the model). So if I could just persuade Django not to refer to it, I'd be fine!
only only defers the loading of certain fields, i.e. it allows for lazy loading of big or unused fields. It should generally not be used unless you know exactly what you're doing and why you need it, as it is nothing more than a performance booster than often decreases performance with improper use.
What you're looking for is values() (or values_list()), which actually excludes certain fields instead of just lazy loading. This will return a dictionary (or list) instead of a model instance, but this is the only way to tell Django to not take other fields into account:
qs = (Result.objects.filter_by(organisation_id=1)
.values('time').annotate(Count('id')))

Django making a query with custom collation

Is it possible to make query with a collation different from database table have?
Using extra() is a little messy. Something similar can now be achieved with Func() expression (since Django 1.8):
username_ci = Func(
'username',
function='utf8_general_ci',
template='(%(expressions)s) COLLATE "%(function)s"')
This can be used in annotate():
User.objects.annotate(uname_ci=username_ci).filter(uname_ci='joeblow').exists()
Or in order_by() to override default collation rules when sorting:
User.objects.order_by(username_ci)
Now, it still may seem messy, but if you look at the docs and code of Func(), you will discover that it is very easy to subclass it and make a reusable collation setter.
I used this trick with Postgres database.
Here is how you can use a specific collation instead of the default collation for a given table/column. I'm assuming you always want that to be the case insensitive utf8_general_ci, but you can easily change that in the code or add it as a variable.
Note the use of the params kwarg instead of the db literal function. Params exists for the exact same purpose.
def iexact(**kw):
fields = [['%s=%%s collate utf8_general_ci'%field,value] for (field,value) in kw.items()]
return dict(where=[f[0] for f in fields], params=[f[1] for f in fields])
if User.objects.extra(**iexact(username='joeblow')).exists():
status = "Found a user with this username!"
I solve this using bit of a hack;
Django's extra method is just like raw method, they both using the query statetment directly;
MyModel.objects.extra(where=["name LIKE '%%" + name + "%%' COLLATE utf8_general_ci"])
But like this sql injection is possible. We need to escape name variable. I searched a lot for a function which just escapes a string for db. Found one in MySQL-python package but it can't escape unicode strings. Also package has literal method in connection but to use it we need an instance (maybe it is for db characteristic).
At last I used Django's db.connection.cursor.
from django.db import connection
cursor = connection.cursor()
name = cursor.db.connection.literal(name)[1:-1] # [1:-1] excluding quotes
With this way we also need an instance but I suppose this not require a db connection. And I suppose this method db independent. If I am wrong please correct me.
This above solution works. In case of getting the reverse order the following snippet
sort_value = sort.strip()
if sort_value in ['name', '-name']:
sort = Func('name', function='C', template='(%(expressions)s) COLLATE "%(function)s"')
if sort_value in ['-name']:
f_res = queryset.order_by(sort).reverse()
else:
f_res = queryset.order_by(sort)
return f_res