Django: Update multiple objects with regex - django

I want to remove 'blog/' substring from slug field of multiple objects according to this and this docs:
>>> import re
>>> from django.db.models import F
>>> p = re.compile('blog/')
>>> Blog.objects.update(slug=p.sub('', F('slug')))
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: expected string or buffer
I tried to add str() to the last string, and it passes without errors:
>>> Blog.objects.update(slug=p.sub('', str(F('slug'))))
but it inserts (DEFAULT: ) into slug field for all objects.
Any suggestions?

To update multiple objects at once in Django with regular expressions over a queryset you can use a Func expression to access regex functions in your database:
django.db.models import F, Func, Value
pattern = Value(r'blog(/?)') # the regex
replacement = Value(r'new-blog-slug\1') # replacement string
flags = Value('g') # regex flags
Blog.objects.update(
slug=Func(
models.F('slug'),
pattern, replacement, flags,
function='REGEXP_REPLACE',
output_field=models.TextField(),
)
)
Check your DB vendor documentation for details and specific functions support.
Use raw strings r'' in pattern and replacement to avoid having to escape the backslashes.
Reference matched substrings in replacement using \n with n from 1 to 9.
You can use F expressions to provide pattern, replacement and flags from fields of each instance:
pattern = F('pattern_field')
replacement = F('replacement_field')
flags = F('flags_field')
You can also use the Func expression to make annotations.
Currently there is an open pull request to add regular expressions database functions in Django. Once merged you will probably have RegexpReplace, RegexpStrIndex and RegexpSubstr function expressions available under django.db.models.functions to make your code more concise and have a single API unified across DB vendors.

You can't do that. The update is done completely within the database, so it must be something translatable to SQL, which your code isn't. You'll need to iterate through and update:
for blog in Blog.objects.filter(slug__startswith='blog/'):
blog.slug = blog.slug.replace('blog/', '')
blog.save()

A little late but for those who need a solution today
Note:New in Django 2.1.
class Replace
Usage example from Documentation:
>>> from django.db.models import Value
>>> from django.db.models.functions import Replace
>>> Author.objects.create(name='Margaret Johnson')
>>> Author.objects.create(name='Margaret Smith')
>>> Author.objects.update(name=Replace('name', Value('Margaret'), Value('Margareth')))
2
>>> Author.objects.values('name')
<QuerySet [{'name': 'Margareth Johnson'}, {'name': 'Margareth Smith'}]>

Related

Parse or get multiple key dictionary data from GET request

Datatables is sending to Django the following query string parameters:
action:remove
data[1][DT_RowId]:1
data[1][volume]:5.0
data[1][coeff]:35
data[2][DT_RowId]:2
data[2][volume]:4.0
data[2][coeff]:50
I can access the values like this:
print request.GET['data[1][volume]']
5.0
How can I access the key itself as a dictionary and its keys?
For example, I would like to access the value as data[1]['volume']. In addition, I need to access the keys; e.g. get 1 from data[1].
I think you will need to parse the keys, yourselves and convert them to dictionary. This could be done quickly using the regular expression module in python.
import re
pattern = re.compile("data\[(?P<key_one>.*?)\]\[(?P<key_two>.*?)\]")
match = pattern.match('data[1][volume]')
key_one = match.group('key_one')
key_two = match.group('key_two')
print(key_one) # Should print 1
print(key_two) # Should print volume
See Python documentation of its regular expression library to learn more.

Always False Q object

In Django ORM, how does one go about creating a Q object that is always False?
This is similar to the question about always True Q objects, but the other way round.
Note that this doesn't work:
Foobar.objects.filter(~Q()) # returns a queryset which gives all objects
Why do I want a Q object instead of the simple False value? So that I can combine it with other Q values, like this for example:
condition = always_true_q_object
if something_or_other:
condition = condition | foobar_that_returns_a_q_object()
if something_or_other2:
condition = condition | foobar_that_returns_a_q_object2()
Note: Sam's answer is better. I've left this answer here instead of deleting it so that you can see the 'more hacky' answer that Sam is referring to
Original answer:
What about:
Q(pk__isnull=True)
or
Q(pk=None)
It seems hacky, but it appears to work. For example:
>>> FooBar.objects.filter(Q(x=10)|Q(pk__isnull=True))
[<FooBar: FooBar object>, ...]
>>> FooBar.objects.filter(Q(x=10)&Q(pk__isnull=True))
[]
However, note that it doesn't work as you might expect when OR'd with an empty Q().
>>> FooBar.objects.filter(Q()|Q(pk__isnull=True))
[]
The solution to this might be to use Q(pk__isnull=False) as the 'always True Q'.
>>> FooBar.objects.filter(Q(pk__isnull=False)|Q(pk__isnull=True))
[<FooBar: FooBar object>, ...]
>>> FooBar.objects.filter(Q(pk__isnull=False)&Q(pk__isnull=True))
[]
Using Q(pk__in=[]) seems to be a good way to represent this idiom.
As indicated by #fwip and comments below: Django's ORM nicely recognises this case, knowing this always evaluates to FALSE. For example:
FooBar.objects.filter(Q(pk__in=[]))
correctly returns an empty QuerySet without involving any round trip to the database. While:
FooBar.objects.filter(
(Q(pk__in=[]) & Q(foo="bar")) |
Q(hello="world")
)
is optimised down to:
FooBar.objects.filter(
Q(hello="world")
)
i.e. it recognises that Q(pk__in=[]) is always FALSE, hence the AND condition can never be TRUE, so is removed.
To see what queries are actually sent to the database, see: How can I see the raw SQL queries Django is running?
I don't have enough reputation to comment, but Sam Mason's answer (Q(pk__in=[])) has the advantage that it doesn't even perform a database query if used alone. Django (v1.10) seems smart enough to recognize that the condition is unsatisfiable, and returns an empty queryset without asking the database.
$ ./manage.py shell_plus
In [1]: from django.db import connection
In [2]: FooBar.objects.filter(Q(pk__in=[]))
Out[2]: <QuerySet []>
In [3]: connection.queries
Out[3]: []

In Django's template engine, is it possible to run a filter through an entire array?

For example, if I have an array of datetime.date objects, I would like to apply a date format filter to each of its elements, while still making use of the default string representation of the array.
Given a date array that looks like:
[datetime.date(2011, 2, 28), datetime.date(2011, 3, 1), datetime.date(2011, 3, 2)]
Assuming that I already passed it to the template's context, I'd like to do this in the template:
<script>
// ...
var dates = {{ my_date_array|date:'b d, Y' }};
// ...
</script>
so it produces:
var dates = ['Feb 28, 2011', 'Mar 1, 2011', 'Mar 2, 2011'];
..instead of having to loop through the elements of the array.
Is this possible by default, without creating a custom filter?
Looking at the source, I'd say that's not possible using the default date filter.
You will have to either use a loop in your template, or create a custom filter that accepts a list of date objects.
Update:
It should be relatively easy to create your own filter by making use of the existing one. For example:
from django.template.defaultfilters import date
from django import template
register = template.Library()
# Only mildly tested. Use with caution.
def datelist(values, arg=None):
try:
outstr = "', '".join([date(v, arg) for v in values])
except TypeError: # non-iterable?
outstr = date(values, arg)
return "['%s']" % outstr
register.filter('datelist', datelist)
If you don't like that approach for determining iterable objects, you could also use:
# requires Python >=2.4
from collections import Iterable
if isinstance(values, Iterable):
# ....

Is there a way to filter a django queryset based on string similarity (a la python difflib)?

I have a need to match cold leads against a database of our clients.
The leads come from a third party provider in bulk (thousands of records) and sales is asking us to (in their words) "filter out our clients" so they don't try to sell our service to a established client.
Obviously, there are misspellings in the leads. Charles becomes Charlie, Joseph becomes Joe, etc. So I can't really just do a filter comparing lead_first_name to client_first_name, etc.
I need to use some sort of string similarity mechanism.
Right now I'm using the lovely difflib to compare the leads' first and last names to a list generated with Client.objects.all(). It works, but because of the number of clients it tends to be slow.
I know that most sql databases have soundex and difference functions. See my test of it in the update below - it doesn't work as well as difflib.
Is there another solution? Is there a better solution?
Edit:
Soundex, at least in my db, doesn't behave as well as difflib.
Here is a simple test - look for "Joe Lopes" in a table containing "Joseph Lopes":
with temp (first_name, last_name) as (
select 'Joseph', 'Lopes'
union
select 'Joe', 'Satriani'
union
select 'CZ', 'Lopes'
union
select 'Blah', 'Lopes'
union
select 'Antonio', 'Lopes'
union
select 'Carlos', 'Lopes'
)
select first_name, last_name
from temp
where difference(first_name+' '+last_name, 'Joe Lopes') >= 3
order by difference(first_name+' '+last_name, 'Joe Lopes')
The above returns "Joe Satriani" as the only match. Even reducing the similarity threshold to 2 doesn't return "Joseph Lopes" as a potential match.
But difflib does a much better job:
difflib.get_close_matches('Joe Lopes', ['Joseph Lopes', 'Joe Satriani', 'CZ Lopes', 'Blah Lopes', 'Antonio Lopes', 'Carlos Lopes'])
['Joseph Lopes', 'CZ Lopes', 'Carlos Lopes']
Edit after gruszczy's response:
Before writing my own, I looked for and found a T-SQL implementation of Levenshtein Distance in the repository of all knowledge.
In testing it, it still won't do a better matching job than difflib.
Which led me to research what algorithm is behind difflib. It seems to be a modified version of the Ratcliff-Obershelp algorithm.
Unhappily I can't seem to find some other kind soul who has already created a T-SQL implementation based on difflib's... I'll try my hand at it when I can.
If nobody else comes up with a better answer in the next few days, I'll grant it to gruszczy. Thanks, kind sir.
soundex won't help you, because it's a phonetic algorithm. Joe and Joseph aren't similar phonetically, so soundex won't mark them as similar.
You can try Levenshtein distance, which is implemented in PostgreSQL. Maybe in your database too and if not, you should be able to write a stored procedure, which will calculate the distance between two strings and use it in your computation.
It's possible with trigram_similar lookups since Django 1.10, see docs for PostgreSQL specific lookups and Full text search
As per the answer of andilabs you can use the Levenshtein function to create your custom function. Postgres doc indicates that the Levenshtein function is as follows:
levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
andilabs answer can use the only second function. If you want a more advanced search with insertion/deletion/substitution costs, you can rewrite function like this:
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s', %(ins_cost)d, %(del_cost)d, %(sub_cost)d)"
function = 'levenshtein'
def __init__(self, expression, search_term, ins_cost=1, del_cost=1, sub_cost=1, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
ins_cost=ins_cost,
del_cost=del_cost,
sub_cost=sub_cost,
**extras
)
And call the function:
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka', 3, 3, 1) # ins = 3, del = 3, sub = 1
).filter(
lev_dist__lte=2
)
If you need getting there with django and postgres and don't want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:
Extension needed fuzzystrmatch
you need adding postgres extension to your db in psql:
CREATE EXTENSION fuzzystrmatch;
Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
then in any other place in project we just import defined Levenshtein and F to pass the django field.
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka')
).filter(
lev_dist__lte=2
)

Django: ordering numerical value with order_by

I'm in a situation where I must output a quite large list of objects by a CharField used to store street addresses.
My problem is, that obviously the data is ordered by ASCII codes since it's a Charfield, with the predictable results .. it sort the numbers like this;
1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21....
Now the obvious step would be to change the Charfield the proper field type (IntegerField let's say), however it cannot work since some address might have apartments .. like "128A".
I really don't know how I can order this properly ..
If you're sure there are only integers in the field, you could get the database to cast it as an integer via the extra method, and order by that:
MyModel.objects.extra(
select={'myinteger': 'CAST(mycharfield AS INTEGER)'}
).order_by('myinteger')
Django is trying to deprecate the extra() method, but has introduced Cast() in v1.10. In sqlite (at least), CAST can take a value such as 10a and will cast it to the integer 10, so you can do:
from django.db.models import IntegerField
from django.db.models.functions import Cast
MyModel.objects.annotate(
my_integer_field=Cast('my_char_field', IntegerField())
).order_by('my_integer_field', 'my_char_field')
which will return objects sorted by the street number first numerically, then alphabetically, e.g. ...14, 15a, 15b, 16, 16a, 17...
If you're using PostgreSQL (not sure about MySQL) you can safely use following code on char/text fields and avoid cast errors:
MyModel.objects.extra(
select={'myinteger': "CAST(substring(charfield FROM '^[0-9]+') AS INTEGER)"}
).order_by('myinteger')
Great tip! It works for me! :) That's my code:
revisioned_objects = revisioned_objects.extra(select={'casted_object_id': 'CAST(object_id AS INTEGER)'}).extra(order_by = ['casted_object_id'])
I know that I’m late on this, but since it’s strongly related to the question, and that I had a hard time finding this:
You have to know that you can directly put the Cast in the ordering option of your model.
from django.db import models
from django.db.models.functions import Cast
class Address(models.Model):
street_number = models.CharField()
class Meta:
ordering = [
Cast("street_number", output_field=models.IntegerField()),
]
From the doc about ordering:
You can also use query expressions.
And from the doc about database functions:
Functions are also expressions, so they can be used and combined with other expressions like aggregate functions. 
The problem you're up against is quite similar to how filenames get ordered when sorting by filename. There, you want "2 Foo.mp3" to appear before "12 Foo.mp3".
A common approach is to "normalize" numbers to expanding to a fixed number of digits, and then sorting based on the normalized form. That is, for purposes of sorting, "2 Foo.mp3" might expand to "0000000002 Foo.mp3".
Django won't help you here directly. You can either add a field to store the "normalized" address, and have the database order_by that, or you can do a custom sort in your view (or in a helper that your view uses) on address records before handing the list of records to a template.
In my case i have a CharField with a name field, which has mixed (int+string) values, for example. "a1", "f65", "P", "55" e.t.c ..
Solved the issue by using the sql cast (tested with postgres & mysql),
first, I try to sort by the casted integer value, and then by the original value of the name field.
parking_slots = ParkingSlot.objects.all().extra(
select={'num_from_name': 'CAST(name AS INTEGER)'}
).order_by('num_from_name', 'name')
This way, in any case, the correct sorting works for me.
In case you need to sort version numbers consisting of multiple numbers separated by a dot (e.g. 1.9.0, 1.10.0), here is a postgres-only solution:
class VersionRecordManager(models.Manager):
def get_queryset(self):
return super().get_queryset().extra(
select={
'natural_version': "string_to_array(version, '.')::int[]",
},
)
def available_versions(self):
return self.filter(available=True).order_by('-natural_version')
def last_stable(self):
return self.available_versions().filter(stable=True).first()
class VersionRecord(models.Model):
objects = VersionRecordManager()
version = models.CharField(max_length=64, db_index=True)
available = models.BooleanField(default=False, db_index=True)
stable = models.BooleanField(default=False, db_index=True)
In case you want to allow non-numeric characters (e.g. 0.9.0 beta, 2.0.0 stable):
def get_queryset(self):
return super().get_queryset().extra(
select={
'natural_version':
"string_to_array( "
" regexp_replace( " # Remove everything except digits
" version, '[^\d\.]+', '', 'g' " # and dots, then split string into
" ), '.' " # an array of integers.
")::int[] "
}
)
I was looking for a way to sort the numeric chars in a CharField and my search led me here. The name fields in my objects are CC Licenses, e.g., 'CC BY-NC 4.0'.
Since extra() is going to be deprecated, I was able to do it this way:
MyObject.objects.all()
.annotate(sorting_int=Cast(Func(F('name'), Value('\D'), Value(''), Value('g'), function='regexp_replace'), IntegerField()))
.order_by('-sorting_int')
Thus, MyObject with name='CC BY-NC 4.0' now has sorting_int=40.
All the answeres in this thread did not work for me because they are assuming numerical text. I found a solution that will work for a subset of cases. Consider this model
Class Block(models.Model):
title = models.CharField()
say I have fields that sometimes have leading characters and trailing numerical characters If i try and order normally
>>> Block.objects.all().order_by('title')
<QuerySet [<Block: 1>, <Block: 10>, <Block: 15>, <Block: 2>, <Block: N1>, <Block: N12>, <Block: N4>]>
As expected, it's correct alphabetically, but makes no sense for us humans. The trick that I did for this particular use case is to replace any text i find with the number 9999 and then cast the value to an integer and order by it.
for most cases that have leading characters this will get the desired result. see below
from django.db.models.expressions import RawSQL
>>> Block.objects.all()\
.annotate(my_faux_integer=RawSQL("CAST(regexp_replace(title, '[A-Z]+', '9999', 'g') AS INTEGER)", ''))\
.order_by('my_faux_integer', 'title')
<QuerySet [<Block: 1>, <Block: 2>, <Block: 10>, <Block: 15>, <Block: N1>, <Block: N4>, <Block: N12>]>