I only know that indexing is helpful and it queries faster.
What is the difference between following two?
1.
class Meta:
indexes = [
models.Index(fields=['last_name', 'first_name',]),
models.Index(fields=['-date_of_birth',]),
]
2.
class Meta:
indexes = [
models.Index(fields=['first_name',]),
models.Index(fields=['last_name',]),
models.Index(fields=['-date_of_birth',]),
]
Example 1:
The first example creates a single index on the last_name and first_name field.
indexes = [
models.Index(fields=['last_name', 'first_name',]),
]
It will be useful if you search on the last name and first name together, or the last name by itself (because last_name is the first field in the index).
MyModel.objects.filter(last_name=last_name, first_name=first_name)
MyModel.objects.filter(last_name=last_name)
However, it will not be useful for searching for the first_name by itself (because first_name is not the first field in the index).
MyModel.objects.filter(first_name=first_name) # not useful
Example 2:
The second example creates an index for the first_name field and a separate index for the last_name field.
indexes = [
models.Index(fields=['first_name',]),
models.Index(fields=['last_name',]),
]
It will be useful if you do lookups based on first name or last name in your code
MyModel.objects.filter(first_name=search)
MyModel.objects.filter(last_name=search)
Django Model Index was introduced in Django 1.11
what is Model.indexes:
By default, indexes are created with an ascending order for each column. To define an index with a descending order for a column, add a hyphen before the field’s name.
For your query,
models.Index(fields=['last_name', 'first_name','-date_of_birth',]), would create SQL with (last_name, first_name, date_of_birth DESC).
Lets move to your question,
you asked difference between 2 queries,
both will take models.Index(fields=['-date_of_birth',]),
because least one will override the assigned variables. from your question least is dateofbirth so it will override above two lines.
so as per documentation preferable method is,
because indexing field should be in single list.. so django will prepare SQL indexing from list of fields...
models.Index(fields=['last_name', 'first_name', '-date_of_birth']),
Related
I have a model that uses PostgreSQL and has field like this:
class MyModel(models.Model):
json_field = models.JSONField(default=list)
This field contains data like this:
[
{"name": "AAAAA", "product": "11111"},
{"name": "BBBBB", "product": "22222"},
]
Now I want to index by json_field -> product field, because it is being used as identification. Then i want to create GinIndex like this:
class Meta:
indexes = [
GinIndex(name='product_json_idx', fields=['json_field->product'], opclasses=['jsonb_path_ops'])
]
When I try to create migration, I get error like this:
'indexes' refers to the nonexistent field 'json_field->product'.
How to create GinIndex that will be used for child attribute in Json Array?
Please don't use a JSONField [Django-doc] for well-structured data: if the structure is clear, like here where we have a list of objects where each object has a name and a product, it makes more sense to work with extra models, like:
class MyModel(models.Model):
# …
pass
class Product(models.Model):
# …
pass
class Entry(models.Model):
my_model = models.ForeignKey(MyModel, on_delete=models.CASCADE)
name = models.CharField(max_length=255)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
This will automatically add indexes on the ForeignKeys, but will also make querying simpeler and usually more efficient.
While databases like PostgreSQL indeed have put effort into making JSON columns easier to query, aggregate, etc. usually it is still beter to perform database normalization [wiki], especially since it has more means for referential integrity, and a lot of aggregates are simpeler on linear data.
If for example later a product is removed, it will require a lot of work to inspect the JSON blobs to remove that product. This is however a scenario that both Django and PostgreSQL databases cover with ON DELETE triggers and which will likely be more effective and safe when using the Django toolchain for this.
I have a model with a non-nullable CharField and 2 x nullable CharField:
class MyModel(models.Model):
name = models.CharField('Name', max_length=255, null=False)
title = models.CharField('Title', max_length=255, blank=True)
position = models.CharField('Position', max_length=255, blank=True)
I want to ensure that name, title, and position are unique together, and so use a UniqueConstraint:
def Meta:
constraints = [
models.UniqueConstraint(
fields=['name', 'title', 'position'],
name="unique_name_title_position"
),
]
However, if title is None then this constraint fails.
Looking into why, this is because you can insert NULL values into columns with the UNIQUE constraint because NULL is the absence of a value, so it is never equal to other NULL values and not considered a duplicate value. This means that it's possible to insert rows that appear to be duplicates if one of the values is NULL.
What's the correct way to strictly enforce this uniqueness in Django?
UniqueConstraint are not enforced on Django level but rather directly on database.
PostgreSQL for instance allows multiple Null records for unique while MSSQL does not
You can rather check if for each subset of columns with CheckConstraint but as you will find out this is tedious as it requires each combination to be uniquely indexed.
Rather to avoid even whole this mess you can follow Django guide for CharField
as documented
Avoid using null on string-based fields such as CharField and
TextField. If a string-based field has null=True, that means it has
two possible values for “no data”: NULL, and the empty string. In most
cases, it’s redundant to have two possible values for “no data;” the
Django convention is to use the empty string, not NULL. One exception
is when a CharField has both unique=True and blank=True set. In this
situation, null=True is required to avoid unique constraint violations
when saving multiple objects with blank values.
Empty string is checked for uniqueness
You could pick values that don't ever occur for title and position and use an index like this:
CREATE UNIQUE INDEX ON mytable (
name,
coalesce(title, '#impossible#'),
coalesce(position, '#impossible#'),
);
That will replace the NULL values with something else, so that duplicates are prevented.
I have a simple Django model similar to this:
class TestModel(models.Model):
test_field = LowerCaseCharField(max_length=20, null=False,
verbose_name='Test Field')
other_test_field = LowerCaseCharField(max_length=20, null=False, unique=True,
verbose_name='Other Test Field')
Notice that other_test_field is a unique field. Now I also have some data stored that looks like this:
[
{
test_field: "object1",
other_test_field: "test1"
},
{
test_field: "object2",
other_test_field: "test2"
}
]
All I'm trying to do now is switch the other_test_field fields in these two objects, so that the first object has "test2" and the second object has "test1" for other_test_field. How do I accomplish that while preserving the uniqueness? Ultimately I'm trying to update data in bulk, not just swapping two fields.
Anything that updates data in serial is going to hit an IntegrityError due to unique constraint violation, and I don't know a good way to remove the unique constraint temporarily, for this one operation, before adding it back. Any suggestions?
I'm using a transaction model to keep track all the events going through the system
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
......
how do I get the top 5 actors in my system?
In sql it will basically be
SELECT actor, COUNT(*) as total
FROM Transaction
GROUP BY actor
ORDER BY total DESC
According to the documentation, you should use:
from django.db.models import Count
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('total')
values() : specifies which columns are going to be used to "group by"
Django docs:
"When a values() clause is used to constrain the columns that are
returned in the result set, the method for evaluating annotations is
slightly different. Instead of returning an annotated result for each
result in the original QuerySet, the original results are grouped
according to the unique combinations of the fields specified in the
values() clause"
annotate() : specifies an operation over the grouped values
Django docs:
The second way to generate summary values is to generate an independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The order by clause is self explanatory.
To summarize: you group by, generating a queryset of authors, add the annotation (this will add an extra field to the returned values) and finally, you order them by this value
Refer to https://docs.djangoproject.com/en/dev/topics/db/aggregation/ for more insight
Good to note: if using Count, the value passed to Count does not affect the aggregation, just the name given to the final value. The aggregator groups by unique combinations of the values (as mentioned above), not by the value passed to Count. The following queries are the same:
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('total')
Transaction.objects.all().values('actor').annotate(total=Count('id')).order_by('total')
Just like #Alvaro has answered the Django's direct equivalent for GROUP BY statement:
SELECT actor, COUNT(*) AS total
FROM Transaction
GROUP BY actor
is through the use of values() and annotate() methods as follows:
Transaction.objects.values('actor').annotate(total=Count('actor')).order_by()
However one more thing must be pointed out:
If the model has a default ordering defined in class Meta, the .order_by() clause is obligatory for proper results. You just cannot skip it even when no ordering is intended.
Further, for a high quality code it is advised to always put a .order_by() clause after annotate(), even when there is no class Meta: ordering. Such approach will make the statement future-proof: it will work just as intended, regardless of any future changes to class Meta: ordering.
Let me provide you with an example. If the model had:
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
class Meta:
ordering = ['id']
Then such approach WOULDN'T work:
Transaction.objects.values('actor').annotate(total=Count('actor'))
That's because Django performs additional GROUP BY on every field in class Meta: ordering
If you would print the query:
>>> print Transaction.objects.values('actor').annotate(total=Count('actor')).query
SELECT "Transaction"."actor_id", COUNT("Transaction"."actor_id") AS "total"
FROM "Transaction"
GROUP BY "Transaction"."actor_id", "Transaction"."id"
It will be clear that the aggregation would NOT work as intended and therefore the .order_by() clause must be used to clear this behaviour and get proper aggregation results.
See: Interaction with default ordering or order_by() in official Django documentation.
If you want reverse (bigger value to smaller value) order just use - minus.
from django.db.models import Count
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('-total')
My data model consists of three main entities:
class User(models.Model):
...
class Source(models.Model):
user = models.ForeignKey(User, related_name='iuser')
country = models.ForeignKey(Country, on_delete=models.DO_NOTHING)
description = models.CharField(max_length=100)
class Destination(models.Model):
user = models.ForeignKey(User, related_name='wuser')
country = models.ForeignKey(Country)
I am trying to create a queryset which is join all sources with destinations by user (many to many). In such a way I would have a table with all possible source/destination combinations for every user.
In SQL I would simple JOIN the three tables and select the appropriate information from each table.
My question is how to perform the query? How to access the query data?
In django queries are done on the model object, its well documented. The queries or querysets are lazy and when they execute they generally return a list of dict, each dict in the list contains the field followed by the value eg: [{'user':'albert','country':'US and A :) ','description':'my description'},....].
All possible source,destination combinations for every user?
I think you will have to use a reverse relation ship to get this done eg:
my_joined_query = User.objects.values('user','source__country','source__description','destination__country')
notice that i'm using the smaller case name of the models Source and Destination which have ForeignKey relationship with User this will join all the three tabels go through the documentation its rich.
Edit:
To make an inner join you will have to tell the query, this can be simply achieved by using __isnull=False on the reverse model name:
my_innerjoined_query = User.objects.filter(source__isnull=False,destination__isnull=False)
This should do a inner join on all the tables.
Then you can select what you want to display by using values as earlier.
hope that helps. :)