I'm in a situation where I must output a quite large list of objects by a CharField used to store street addresses.
My problem is, that obviously the data is ordered by ASCII codes since it's a Charfield, with the predictable results .. it sort the numbers like this;
1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21....
Now the obvious step would be to change the Charfield the proper field type (IntegerField let's say), however it cannot work since some address might have apartments .. like "128A".
I really don't know how I can order this properly ..
If you're sure there are only integers in the field, you could get the database to cast it as an integer via the extra method, and order by that:
MyModel.objects.extra(
select={'myinteger': 'CAST(mycharfield AS INTEGER)'}
).order_by('myinteger')
Django is trying to deprecate the extra() method, but has introduced Cast() in v1.10. In sqlite (at least), CAST can take a value such as 10a and will cast it to the integer 10, so you can do:
from django.db.models import IntegerField
from django.db.models.functions import Cast
MyModel.objects.annotate(
my_integer_field=Cast('my_char_field', IntegerField())
).order_by('my_integer_field', 'my_char_field')
which will return objects sorted by the street number first numerically, then alphabetically, e.g. ...14, 15a, 15b, 16, 16a, 17...
If you're using PostgreSQL (not sure about MySQL) you can safely use following code on char/text fields and avoid cast errors:
MyModel.objects.extra(
select={'myinteger': "CAST(substring(charfield FROM '^[0-9]+') AS INTEGER)"}
).order_by('myinteger')
Great tip! It works for me! :) That's my code:
revisioned_objects = revisioned_objects.extra(select={'casted_object_id': 'CAST(object_id AS INTEGER)'}).extra(order_by = ['casted_object_id'])
I know that I’m late on this, but since it’s strongly related to the question, and that I had a hard time finding this:
You have to know that you can directly put the Cast in the ordering option of your model.
from django.db import models
from django.db.models.functions import Cast
class Address(models.Model):
street_number = models.CharField()
class Meta:
ordering = [
Cast("street_number", output_field=models.IntegerField()),
]
From the doc about ordering:
You can also use query expressions.
And from the doc about database functions:
Functions are also expressions, so they can be used and combined with other expressions like aggregate functions.
The problem you're up against is quite similar to how filenames get ordered when sorting by filename. There, you want "2 Foo.mp3" to appear before "12 Foo.mp3".
A common approach is to "normalize" numbers to expanding to a fixed number of digits, and then sorting based on the normalized form. That is, for purposes of sorting, "2 Foo.mp3" might expand to "0000000002 Foo.mp3".
Django won't help you here directly. You can either add a field to store the "normalized" address, and have the database order_by that, or you can do a custom sort in your view (or in a helper that your view uses) on address records before handing the list of records to a template.
In my case i have a CharField with a name field, which has mixed (int+string) values, for example. "a1", "f65", "P", "55" e.t.c ..
Solved the issue by using the sql cast (tested with postgres & mysql),
first, I try to sort by the casted integer value, and then by the original value of the name field.
parking_slots = ParkingSlot.objects.all().extra(
select={'num_from_name': 'CAST(name AS INTEGER)'}
).order_by('num_from_name', 'name')
This way, in any case, the correct sorting works for me.
In case you need to sort version numbers consisting of multiple numbers separated by a dot (e.g. 1.9.0, 1.10.0), here is a postgres-only solution:
class VersionRecordManager(models.Manager):
def get_queryset(self):
return super().get_queryset().extra(
select={
'natural_version': "string_to_array(version, '.')::int[]",
},
)
def available_versions(self):
return self.filter(available=True).order_by('-natural_version')
def last_stable(self):
return self.available_versions().filter(stable=True).first()
class VersionRecord(models.Model):
objects = VersionRecordManager()
version = models.CharField(max_length=64, db_index=True)
available = models.BooleanField(default=False, db_index=True)
stable = models.BooleanField(default=False, db_index=True)
In case you want to allow non-numeric characters (e.g. 0.9.0 beta, 2.0.0 stable):
def get_queryset(self):
return super().get_queryset().extra(
select={
'natural_version':
"string_to_array( "
" regexp_replace( " # Remove everything except digits
" version, '[^\d\.]+', '', 'g' " # and dots, then split string into
" ), '.' " # an array of integers.
")::int[] "
}
)
I was looking for a way to sort the numeric chars in a CharField and my search led me here. The name fields in my objects are CC Licenses, e.g., 'CC BY-NC 4.0'.
Since extra() is going to be deprecated, I was able to do it this way:
MyObject.objects.all()
.annotate(sorting_int=Cast(Func(F('name'), Value('\D'), Value(''), Value('g'), function='regexp_replace'), IntegerField()))
.order_by('-sorting_int')
Thus, MyObject with name='CC BY-NC 4.0' now has sorting_int=40.
All the answeres in this thread did not work for me because they are assuming numerical text. I found a solution that will work for a subset of cases. Consider this model
Class Block(models.Model):
title = models.CharField()
say I have fields that sometimes have leading characters and trailing numerical characters If i try and order normally
>>> Block.objects.all().order_by('title')
<QuerySet [<Block: 1>, <Block: 10>, <Block: 15>, <Block: 2>, <Block: N1>, <Block: N12>, <Block: N4>]>
As expected, it's correct alphabetically, but makes no sense for us humans. The trick that I did for this particular use case is to replace any text i find with the number 9999 and then cast the value to an integer and order by it.
for most cases that have leading characters this will get the desired result. see below
from django.db.models.expressions import RawSQL
>>> Block.objects.all()\
.annotate(my_faux_integer=RawSQL("CAST(regexp_replace(title, '[A-Z]+', '9999', 'g') AS INTEGER)", ''))\
.order_by('my_faux_integer', 'title')
<QuerySet [<Block: 1>, <Block: 2>, <Block: 10>, <Block: 15>, <Block: N1>, <Block: N4>, <Block: N12>]>
Related
I have recently been working with Django, and it has been confusing me a lot (although I also like it).
The problem I am facing right now is when I am looping, and in the loop modifying the queryset, in the next loop the .filter is not working.
So let's take the following simplified example:
I have a dictionary that is made from the queryset like this
animal_dict = {chicken: 6,
cows: 7,
fish: 1,
sheep: 2}
The queryset is called self.animals
for key in dict:
if dict[key] < 3:
remove_animal = max(dict, key=dict.get)
remove = self.animals.filter(animal = remove_animal)[-2:]
self.animals = self.animals.difference(remove)
key[replaced_industry] = key[replaced_industry] - 2
What I am trying to do is as follows: my goal is that there needs to be a balance under the animals. So since there are not enough fish, 2 of the animals with the highest n have to go (cows). And then in the second loop - since there are not enough sheep, 2 of the animals with the highest n have to go again (chicken).
Now the first time it loops (with fish), the .filter does exactly as it should. However, when I loop it a second time (for sheep), the remove = self.animals.filter(animal = remove_animal)[-2:] gives me an output is not in line with animal = filter. When I print the remove in the second loop, it returns a list of all different animals (instead of just 1).
After the loops, the dict should look like this: {chicken: 4,
cows: 5,
fish: 1,
sheep: 2}
This because first cow will go down 2 and it is the max, and then chicken will go down 2, as it is then the max
I am definitely missing some Django logic here, but to me this seems very strange. I hope the question is well-understood, else happy to clarify further.
As others have pointed out, every time you call self.animals.filter you are making a request to the database, and this should not be done in a loop.
It isn't very clear what you are trying to achieve, but it seems like you want the number of each type of animal to be (almost?) the same number, and that the only operation you can perform on it is reducing the number of animals.
Its always better to avoid loops if you can.
If you want them all to be the same number, and you can only reduce the number of animals you have, then the best solution would be
fewest_number = min(self.animals.values())
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you want, to say, allow a tolerance +1 of the fewest animal
fewest_number = min(self.animals.values()) + 1
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you can increase the number of each type of animal, then you could find the average:
average_number = sum(self.animals.values()) / len(self.animals)
self.animals = {animal: average_number for animal in self.animals.keys()}
Answering my own question here. Apparently it didn't work because only count(), order_by(), values(), values_list() and slicing of union queryset is allowed. You can't filter on union queryset and the same applies to .difference.
More about this here: Django: Filter a Queryset made of unions not working
Because in the second loop it is made into a union queryset, the filter function simply doesn't work. The weird thing is that this doesn't show an error, but will simply not filter, what makes it hard to detect.
I'm trying to create a model where one of the fields should be an Age field, but that instead of being a simple number (IntegerField), I needed to be a Choice of several available age ranges (5-8, 8-12, 12-18, 18-99, 5-99). I'm looking at the documentation of Choices, but I'm not even sure I can use directly an IntegerRangeField in this, so I ended up with something like this:
class Person(models.Model):
FIRST_RANGE = IntegerRangeField(blank=True, validators=[MinValueValidator(5), MaxValueValidator(8)])
SECOND_RANGE = IntegerRangeField(blank=True, validators=[MinValueValidator(8), MaxValueValidator(12)])
THIRD_RANGE = IntegerRangeField(blank=True, validators=[MinValueValidator(12), MaxValueValidator(18)])
FOURTH_RANGE = IntegerRangeField(blank=True, validators=[MinValueValidator(18), MaxValueValidator(99)])
FIFTH_RANGE = IntegerRangeField(blank=True, validators=[MinValueValidator(18), MaxValueValidator(99)])
AGE_CHOICES = (
(FIRST_RANGE, '5-8'),
(SECOND_RANGE, '8-12'),
(THIRD_RANGE, '12-18'),
(FOURTH_RANGE, '18-99'),
(FIFTH_RANGE, '5-99'),
)
age = models.IntegerRangeField(blank=True, choices=AGE_CHOICES)
Is this the correct approach for this? This looks a bit awkward to me, I'm considering just using Char instead, although I'd like to stick to a have a Range on this field at the end...
Thanks!
From the documentation of Range Fields in django:
All of the range fields translate to psycopg2 Range objects in python, but also accept tuples as input if no bounds information is necessary. The default is lower bound included, upper bound excluded.
It seems you can use tuples to create the choices.
FIRST_RANGE = (5, 8) # here 5 is included and 8 is excluded
# and similarly create the other ranges and then use in AGE_CHOICES
Alternatively, you can create the Range objects.
from psycopg2.extras import Range
FIRST_RANGE = Range(lower=5, upper=8, bounds='[)')
# bounds: one of the literal strings (), [), (], [], representing whether the lower or upper bounds are included
A lot of websites will display:
"1.8K pages" instead of "1,830 pages"
or
"43.2M pages" instead of "43,200,123 pages"
Is there a way to do this in Django?
For example, the following code will generate the quantified amount of objects in the queryset (i.e. 3,123):
Books.objects.all().count()
Is there a way to add a custom count filter to return "3.1K pages" instead of "3,123 pages?
Thank you in advance!
First off, I wouldn't do anything that alters the way the ORM portion of Django works. There are two places this could be done, if you are only planning on using it in one place - do it on the frontend. With that said, there are many ways to achieve this result. Just to spout off a few ideas, you could write a property on your model that calls count then converts that to something a little more human readable for the back end. If you want to do it on the frontend you might want to find a JavaScript lib that could do the conversion.
I will edit this later from my computer and add an example of the property.
Edit: To answer your comment, the easier one to implement depends on your skills in python vs in JavaScript. I prefer python so I would probably do it in there somewhere on the model.
Edit2: I have wrote an example to show you how I would do a classmethod on a base model or on the model that you need these numbers on. I found a python package called humanize and I took its function that converts these to readable and modified it a bit to allow for thousands and took out some of the super large number conversion.
def readable_number(value, short=False):
# Modified from the package `humanize` on pypy.
powers = [10 ** x for x in (3, 6, 9, 12, 15, 18)]
human_powers = ('thousand', 'million', 'billion', 'trillion', 'quadrillion')
human_powers_short = ('K', 'M', 'B', 'T', 'QD')
try:
value = int(value)
except (TypeError, ValueError):
return value
if value < powers[0]:
return str(value)
for ordinal, power in enumerate(powers[1:], 1):
if value < power:
chopped = value / float(powers[ordinal - 1])
chopped = format(chopped, '.1f')
if not short:
return '{} {}'.format(chopped, human_powers[ordinal - 1])
return '{}{}'.format(chopped, human_powers_short[ordinal - 1])
class MyModel(models.Model):
#classmethod
def readable_count(cls, short=True):
count = cls.objects.all().count()
return readable_number(count, short=short)
print(readable_number(62220, True)) # Returns '62.2K'
print(readable_number(6555500)) # Returns '6.6 million'
I would stick that readable_number in some sort of utils and just import it in your models file. Once you have that, you can just stick that string wherever you would like on your frontend.
You would use MyModel.readable_count() to get that value. If you want it under MyModel.objects.readable_count() you will need to make a custom object manager for your model, but that is a bit more advanced.
I have a need to match cold leads against a database of our clients.
The leads come from a third party provider in bulk (thousands of records) and sales is asking us to (in their words) "filter out our clients" so they don't try to sell our service to a established client.
Obviously, there are misspellings in the leads. Charles becomes Charlie, Joseph becomes Joe, etc. So I can't really just do a filter comparing lead_first_name to client_first_name, etc.
I need to use some sort of string similarity mechanism.
Right now I'm using the lovely difflib to compare the leads' first and last names to a list generated with Client.objects.all(). It works, but because of the number of clients it tends to be slow.
I know that most sql databases have soundex and difference functions. See my test of it in the update below - it doesn't work as well as difflib.
Is there another solution? Is there a better solution?
Edit:
Soundex, at least in my db, doesn't behave as well as difflib.
Here is a simple test - look for "Joe Lopes" in a table containing "Joseph Lopes":
with temp (first_name, last_name) as (
select 'Joseph', 'Lopes'
union
select 'Joe', 'Satriani'
union
select 'CZ', 'Lopes'
union
select 'Blah', 'Lopes'
union
select 'Antonio', 'Lopes'
union
select 'Carlos', 'Lopes'
)
select first_name, last_name
from temp
where difference(first_name+' '+last_name, 'Joe Lopes') >= 3
order by difference(first_name+' '+last_name, 'Joe Lopes')
The above returns "Joe Satriani" as the only match. Even reducing the similarity threshold to 2 doesn't return "Joseph Lopes" as a potential match.
But difflib does a much better job:
difflib.get_close_matches('Joe Lopes', ['Joseph Lopes', 'Joe Satriani', 'CZ Lopes', 'Blah Lopes', 'Antonio Lopes', 'Carlos Lopes'])
['Joseph Lopes', 'CZ Lopes', 'Carlos Lopes']
Edit after gruszczy's response:
Before writing my own, I looked for and found a T-SQL implementation of Levenshtein Distance in the repository of all knowledge.
In testing it, it still won't do a better matching job than difflib.
Which led me to research what algorithm is behind difflib. It seems to be a modified version of the Ratcliff-Obershelp algorithm.
Unhappily I can't seem to find some other kind soul who has already created a T-SQL implementation based on difflib's... I'll try my hand at it when I can.
If nobody else comes up with a better answer in the next few days, I'll grant it to gruszczy. Thanks, kind sir.
soundex won't help you, because it's a phonetic algorithm. Joe and Joseph aren't similar phonetically, so soundex won't mark them as similar.
You can try Levenshtein distance, which is implemented in PostgreSQL. Maybe in your database too and if not, you should be able to write a stored procedure, which will calculate the distance between two strings and use it in your computation.
It's possible with trigram_similar lookups since Django 1.10, see docs for PostgreSQL specific lookups and Full text search
As per the answer of andilabs you can use the Levenshtein function to create your custom function. Postgres doc indicates that the Levenshtein function is as follows:
levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
andilabs answer can use the only second function. If you want a more advanced search with insertion/deletion/substitution costs, you can rewrite function like this:
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s', %(ins_cost)d, %(del_cost)d, %(sub_cost)d)"
function = 'levenshtein'
def __init__(self, expression, search_term, ins_cost=1, del_cost=1, sub_cost=1, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
ins_cost=ins_cost,
del_cost=del_cost,
sub_cost=sub_cost,
**extras
)
And call the function:
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka', 3, 3, 1) # ins = 3, del = 3, sub = 1
).filter(
lev_dist__lte=2
)
If you need getting there with django and postgres and don't want to use introduced in 1.10 trigram-similarity https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/lookups/#trigram-similarity you can implement using Levensthein like these:
Extension needed fuzzystrmatch
you need adding postgres extension to your db in psql:
CREATE EXTENSION fuzzystrmatch;
Lets define custom function with wich we can annotate queryset. It just take one argument the search_term and uses postgres levenshtein function (see docs):
from django.db.models import Func
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
then in any other place in project we just import defined Levenshtein and F to pass the django field.
from django.db.models import F
Spot.objects.annotate(
lev_dist=Levenshtein(F('name'), 'Kfaka')
).filter(
lev_dist__lte=2
)
Nearly every kind of lookup in Django has a case-insensitive version, EXCEPT in, it appears.
This is a problem because sometimes I need to do a lookup where I am certain the case will be incorrect.
Products.objects.filter(code__in=[user_entered_data_as_list])
Is there anything I can do to deal with this? Have people come up with a hack to work around this issue?
I worked around this by making the MySQL database itself case-insensitive. I doubt that the people at Django are interested in adding this as a feature or in providing docs on how to provide your own field lookup (assuming that is even possible without providing code for each db backend)
Here is one way to do it, admittedly it is clunky.
products = Product.objects.filter(**normal_filters_here)
results = Product.objects.none()
for d in user_entered_data_as_list:
results |= products.filter(code__iexact=d)
If your database is MySQL, Django treats IN queries case insensitively. Though I am not sure about others
Edit 1:
model_name.objects.filter(location__city__name__in': ['Tokio','Paris',])
will give following result in which city name is
Tokio or TOKIO or tokio or Paris or PARIS or paris
If it won't create conflicts, a possible workaround may be transforming the strings to upper or lowercase both when the object is saved and in the filter.
Here is a solution that do not require case-prepared DB values.
Also it makes a filtering on DB-engine side, meaning much more performance than iterating over objects.all().
def case_insensitive_in_filter(fieldname, iterable):
"""returns Q(fieldname__in=iterable) but case insensitive"""
q_list = map(lambda n: Q(**{fieldname+'__iexact': n}), iterable)
return reduce(lambda a, b: a | b, q_list)
The other efficient solution is to use extra with quite portable raw-SQL lower() function:
MyModel.objects.extra(
select={'lower_' + fieldname: 'lower(' + fieldname + ')'}
).filter('lover_' + fieldname + '__in'=[x.lower() for x in iterable])
Another solution - albeit crude - is to include the different cases of the original strings in the list argument to the 'in' filter. For example: instead of ['a', 'b', 'c'], use ['a', 'b', 'c', 'A', 'B', 'C'] instead.
Here's a function that builds such a list from a list of strings:
def build_list_for_case_insensitive_query(the_strings):
results = list()
for the_string in the_strings:
results.append(the_string)
if the_string.upper() not in results:
results.append(the_string.upper())
if the_string.lower() not in results:
results.append(the_string.lower())
return results
A lookup using Q object can be built to hit the database only once:
from django.db.models import Q
user_inputed_codes = ['eN', 'De', 'FR']
lookup = Q()
for code in user_inputed_codes:
lookup |= Q(code__iexact=code)
filtered_products = Products.objects.filter(lookup)
A litle more elegant way would be this:
[x for x in Products.objects.all() if x.code.upper() in [y.upper() for y in user_entered_data_as_list]]
You can do it annotating the lowered code and also lowering the entered data
from django.db.models.functions import Lower
Products.objects.annotate(lower_code=Lower('code')).filter(lower_code__in=[user_entered_data_as_list_lowered])