I have a model that has a field (lets call it this_field) which is stored as a string. The values in this_field are in the form Char### - as in, they have values such as: A4 or A34 or A55 (note that they will always have the same first character).
Is there a way to select all the records and order them by this field? Currently, when I do an .order_by('this_field') I get the order like this:
A1
A13
A2
A25
etc...
What I want is:
A1
A2
A13
A25
etc...
Any best approach methods or solutions to achieve this query set ordered properly?
Queryset ordering is handled by the database backend, not by Django. This limits the options for changing the way that ordering is done. You can either load all of the data and sort it with Python, or add additional options to your query to have the database use some kind of custom sorting by defining functions.
Use the queryset extra() function will allow you to do what you want by executing custom SQL for sorting, but at the expense of reduced portability.
In your example, it would probably suffice to split the input field into two sets of data, the initial character, and the remaining integer value. You could then apply a sort to both columns. Here's an example (untested):
qs = MyModel.objects.all()
# Add in a couple of extra SELECT columns, pulling apart this_field into
# this_field_a (the character portion) and this_field_b (the integer portion).
qs = qs.extra(select={
'this_field_a': "SUBSTR(this_field, 1)",
'this_field_b': "CAST(substr(this_field, 2) AS UNSIGNED)"})
The extra call adds two new fields into the SELECT call. The first clause pulls out the first character of the field, the second clause converts the remainder of the field to an integer.
It should now be possible to order_by on these fields. By specifying two fields to order_by, ordering applies to the character field first, then to the integer field.
eg
qs = qs.order_by('this_field_a', 'this_field_b')
This example should work on both MySql and SQLite. It should also be possible to create a single extra field which is used only for sorting, allowing you to specify just a single field in the order_by() call.
If you use this sort order a lot and on bigger tables you should think about two separate fields that contain the separated values:
the alpha values should be lowercased only, in a text or char field, with db_index=True set
the numeric values should be in an integer field with db_index=True set on them.
qs.order_by('alpha_sort_field', 'numeric_sort_field')
Otherwise you will probably experience some (or up to a huge) performance impact.
Another way of doing it is to sort the QuerySet based on the int part of this_field:
qs = ModelClass.objects.all()
sorted_qs = sorted(qs, key=lambda ModelClass: int(ModelClass.this_field[1:]))
Related
In my app, I have a document number which consists of several fields of Document model like:
{{doc_code}}{{doc_num}}-{{doc_year}}
doc_num is an integer in the model, but for the user, it is a five digits string, where empty spaces are filled by zero, like 00024, or 00573.
doc_year is a date field in the model, but in full document number, it is the two last digits of the year.
So for users, the document number is for example - TR123.00043-22.
I want to implement searching on the documents list page.
One approach is to autogenerate the full_number field from doc_code, doc_num and doc_year fields in the save method of Document model and filter on this full_number.
Anothe is to use Concat function before using of filter on query.
First by concatinate full_code field
docs = Document.annotate(full_code=Concat('doc_code', 'doc_num', Value('-'), 'doc_year', output_field=CharField()))
and than filter by full_code field
docs = docs.filter(full_code__icontain=keyword)
But how to pass doc_num as five digits string and doc_year as two last digits of year to Concat function?
Or what could be a better solution for this task?
Concat will only take field names and string values, so you don't really have many options there that I know of.
As you note, you can set an extra field on save. That's probably the best approach if you are going to be using it in multiple places.
The save function would look something ike
def save(self, *args, **kwargs):
super().save()
self.full_code = str(self.doc_code) + f"{doc_num:05d}") + '-' + time.strftime("%y", doc_year))
self.save()
doc_num requires python>= 3.6, other methods for earlier pythons can be seen here
doc_year assumes it is a datetime type. If it is just a four digit int then something like str(doc_year)[-2:] should work instead.
Alternately, if you are only ever going to use it rarely you could loop through your recordset adding an additional field
docs=Document.objects.all() #or whatever filter is appropriate
for doc in docs:
doc.full_code = f"{doc.doc_code}{doc.doc_num}-{time.strftime("%y", doc_year)}
#or f"{doc.doc_code}{doc.doc_num}-{str(doc_year)[-2:]} if doc_year not datetime
and then convert it to a list so you don't make another DB call and lose your new field, and filter it via list comprehension.
filtered_docs = [x for x in list(docs) if search_term in x.full_code]
pass filtered_docs to your template and away you go.
Can anyone help, I want to return an ordered list based on forloop in Django using a field in the model that contains both integer and string in the format MM/1234. The loop should return the values with the least interger(1234) in ascending order in the html template.
Ideally you want to change the model to have two fields, one integer and one string, so you can code a queryset with ordering based on the integer one. You can then define a property of the model to return the self.MM+"/"+str( self.nn) composite value if you often need to use that. But if it's somebody else's database schema, this may not be an option.
In which case you'll have to convert your queryset into a list (which reads all the data rows at once) and then sort the list in Python rather than in the database. You can run out of memory or bring your server to its knees if the list contains millions of objects. count=qs.count() is a DB operation that won't.
qs = Foo.objects.filter( your_selection_criteria)
# you might want to count this before the next step, and chicken out if too many
# also this simple key function will crash if there's ever no "/" in that_field
all_obj = sorted( list( qs),
key = lambda obj: obj.that_field.split('/')[1] )
With the following Django model:
class Item(models.Model):
name = CharField(max_len=256)
description = TextField()
I need to formulate a filter method that takes a list of n words (word_list) and returns the queryset of Items where each word in word_list can be found, either in the name or the description.
To do this with a single field is straightforward enough. Using the reduce technique described here (this could also be done with a for loop), this looks like:
q = reduce(operator.and_, (Q(description__contains=word) for word in word_list))
Item.objects.filter(q)
I want to do the same thing but take into account that each word can appear either in the name or the description. I basically want to query the concatenation of the two fields, for each word. Can this be done?
I have read that there is a concatenation operator in Postgresql, || but I am not sure if this can be utilized somehow in django to achieve this end.
As a last resort, I can create a third column that contains the combination of the two fields and maintain it via post_save signal handlers and/or save method overrides, but I'm wondering whether I can do this on the fly without maintaining this type of "search index" type of column.
The most straightforward way would be to use Q to do an OR:
lookups = [Q(name__contains=word) | Q(description__contains=word)
for word in words]
Item.objects.filter(*lookups) # the same as and'ing them together
I can't speak to the performance of this solution as compared to your other two options (raw SQL concatenation or denormalization), but it's definitely simpler.
I have a model that has four fields. How do I remove duplicate objects from my database?
Daniel Roseman's answer to this question seems appropriate, but I'm not sure how to extend this to situation where there are four fields to compare per object.
Thanks,
W.
def remove_duplicated_records(model, fields):
"""
Removes records from `model` duplicated on `fields`
while leaving the most recent one (biggest `id`).
"""
duplicates = model.objects.values(*fields)
# override any model specific ordering (for `.annotate()`)
duplicates = duplicates.order_by()
# group by same values of `fields`; count how many rows are the same
duplicates = duplicates.annotate(
max_id=models.Max("id"), count_id=models.Count("id")
)
# leave out only the ones which are actually duplicated
duplicates = duplicates.filter(count_id__gt=1)
for duplicate in duplicates:
to_delete = model.objects.filter(**{x: duplicate[x] for x in fields})
# leave out the latest duplicated record
# you can use `Min` if you wish to leave out the first record
to_delete = to_delete.exclude(id=duplicate["max_id"])
to_delete.delete()
You shouldn't do it often. Use unique_together constraints on database instead.
This leaves the record with the biggest id in the DB. If you want to keep the original record (first one), modify the code a bit with models.Min. You can also use completely different field, like creation date or something.
Underlying SQL
When annotating django ORM uses GROUP BY statement on all model fields used in the query. Thus the use of .values() method. GROUP BY will group all records having those values identical. The duplicated ones (more than one id for unique_fields) are later filtered out in HAVING statement generated by .filter() on annotated QuerySet.
SELECT
field_1,
…
field_n,
MAX(id) as max_id,
COUNT(id) as count_id
FROM
app_mymodel
GROUP BY
field_1,
…
field_n
HAVING
count_id > 1
The duplicated records are later deleted in the for loop with an exception to the most frequent one for each group.
Empty .order_by()
Just to be sure, it's always wise to add an empty .order_by() call before aggregating a QuerySet.
The fields used for ordering the QuerySet are also included in GROUP BY statement. Empty .order_by() overrides columns declared in model's Meta and in result they're not included in the SQL query (e.g. default sorting by date can ruin the results).
You might not need to override it at the current moment, but someone might add default ordering later and therefore ruin your precious delete-duplicates code not even knowing that. Yes, I'm sure you have 100% test coverage…
Just add empty .order_by() to be safe. ;-)
https://docs.djangoproject.com/en/3.2/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Transaction
Of course you should consider doing it all in a single transaction.
https://docs.djangoproject.com/en/3.2/topics/db/transactions/#django.db.transaction.atomic
If you want to delete duplicates on single or multiple columns, you don't need to iterate over millions of records.
Fetch all unique columns (don't forget to include the primary key column)
fetch = Model.objects.all().values("id", "skuid", "review", "date_time")
Read the result using pandas (I did using pandas instead ORM query)
import pandas as pd
df = pd.DataFrame.from_dict(fetch)
Drop duplicates on unique columns
uniq_df = df.drop_duplicates(subset=["skuid", "review", "date_time"])
## Dont add primary key in subset you dumb
Now, you'll get the unique records from where you can pick the primary key
primary_keys = uniq_df["id"].tolist()
Finally, it's show time (exclude those id's from records and delete rest of the data)
records = Model.objects.all().exclude(pk__in=primary_keys).delete()
The model has an IntegerField with null=True, blank=True, so it allows the field to have a value of None. When the column is sorted, the Nones are always treated as the lowest values. Is it possible to make Django treat them as the highest values? So when sorted in descending order, instead of:
100 43 7 None
It would go:
None 100 43 7
I realize I could assign an extremely high number instead of None, but for neatness' sake, I was wondering if there were any other options.
Thanks!
Daniel is correct in that the database determines the sort order, and different databases treat the ordering of NULLs differently. However, there are ways to get the DB to give you the order you want.
PostgreSQL is nice enough to actually allow you to append "NULLS FIRST" or "NULLS LAST" to your query (ref).
SELECT * FROM table_name ORDER BY int_field DESC NULLS FIRST;
For MySQL and SQLite, there's an alternative (which will also work in PostgreSQL), as described here. Essentially, if you want nulls last, you would do:
SELECT *, int_field IS NULL AS is_null FROM table_name ORDER BY is_null DESC, int_field DESC;
However, getting Django to execute these queries is a different story alltogether. In Django 1.2, model managers now have a raw() method, documented here, which returns a RawQuerySet, which is like a QuerySet, but can't be stacked (e.g. you can't add a filter() call in there). Of course, instead of stacking, you can just add your lookup parameters to the SQL. Whether or not this functionality is useful to you depends on what you're trying to accomplish. If you simply want to fetch your models in that order then pass the queryset to a view or something, you can do:
YourModel.objects.raw('SELECT *, int_field IS NULL AS is_null FROM table_name ORDER BY is_null DESC, int_field DESC')
If however you want this to be the default ordering for use in the admin and such, you'll need a different approach, perhaps via overriding the manager.
It's not Django that determines the sort order, but the database. And databases have their own rules about how to sort NULLs.
One (rather complicated) possibility would be to implement a custom field that uses a custom database type to sort in the correct order. The details of how you would do this are likely to depend on your database.
Since I'm using Django 1.1 and couldn't use raw(), the simplest way turned out to be to create a "int_sort" field, and populate it with the value of the IntegerField, unless it encountered a None, in which case it would take the value of sys.maxint.
Then, in admin.py, I set the admin_order_field to be the "int_sort" field.