How to optimize multi column indexing in Django with PostgreSQL - django

In my application I have this model:
class ObservedData(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
unique_id = models.CharField(max_length=128)
timestamp = models.DateTimeField()
value = models.FloatField(null=True)
90% of my queries are
select * from ObservedData where user=<user> AND unique_id=<uid> AND timestamp BETWEEN <date1> AND <date2>
This is how I am indexing:
class Meta:
indexes = [
models.Index(fields=['user', 'unique_id', 'timestamp']),
models.Index(fields=['user', 'unique_id']),
models.Index(fields=['unique_id', 'timestamp']),
]
unique_together = ('user', 'unique_id', 'timestamp')
Is this the correct way to do it? I have noticed an increased growth of the database allocated space and I was wondering if this is an overloading or unnecessary overlapping of indexes.

Related

Why Django get result list from query_set too late?

I am studying about Django ORM. I couldn't get an answer from the search, but I'd appreciate it if someone could tell me the related site.
My model is as follows. user1 has2 accounts, and 500,000 transactions belong to one of the accounts.
class Account(models.Model):
class Meta:
db_table = 'account'
ordering = ['created_at']
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
account = models.CharField(max_length=20, null=False, blank=False, primary_key=True)
balance = models.PositiveBigIntegerField(default=0)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class AccountTransaction(models.Model):
class Meta:
db_table = 'account_transaction'
ordering = ['tran_time']
indexes = [
models.Index(fields=['tran_type', 'tran_time', ]),
]
account = models.ForeignKey(Account, on_delete=models.CASCADE)
tran_amt = models.PositiveBigIntegerField()
balance = models.PositiveBigIntegerField()
tran_type = models.CharField(max_length=10, null=False, blank=False)
tran_detail = models.CharField(max_length=100, null=True, default="")
tran_time = models.DateTimeField(auto_now_add=True)
The query time for the above model is as follows.
start = time.time()
rs = request.user.account_set.all().get(account="0000000010").accounttransaction_set.all()
count = rs.count()
print('>>all')
print(time.time() - start) # 0.028000831604003906
start = time.time()
q = Q(tran_time__date__range = ("2000-01-01", "2000-01-03"))
rs = request.user.account_set.all().get(account="0000000010").accounttransaction_set.filter(q)
print('>>filter')
print(time.time() - start) # 0.0019981861114501953
start = time.time()
result = list(rs)
print('>>offset')
print(time.time() - start) # 5.4373579025268555
The result of the query_set is about 3500 in total. (3500 out of 500,000 records were selected).
I've done a number of things, such as setting offset to the result (rs) of query_set, but it still takes a long time to get the actual value from query_set.
I know that the view loads data when approaching actual values such as count(), but what did I do wrong?
From https://docs.djangoproject.com/en/4.1/topics/db/queries/#querysets-are-lazy:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
q = Entry.objects.filter(headline__startswith="What")
q = q.filter(pub_date__lte=datetime.date.today())
q = q.exclude(body_text__icontains="food")
print(q)
Though this looks like three database hits, in fact it hits the
database only once, at the last line (print(q)). In general, the
results of a QuerySet aren’t fetched from the database until you “ask”
for them. When you do, the QuerySet is evaluated by accessing the
database. For more details on exactly when evaluation takes place, see
When QuerySets are evaluated.
In your example the database is hit only when you're calling list(rs), that's why it takes so long.

How to store 2D Array in model of Django

Problem:
I am in need to store 2D array in one of the field of the Table.
Example
id:1
Teacher_name:"Amit"
time: [[9:00am, 2:00pm], [2:00pm,6:00pm], [6:00am, 9:00pm]] # Need to store 2-D array kind of multiple time stamps in a field,
code is here:
Model.py
class ScheduleClassification(models.Model):
vendor_id = models.ForeignKey(Vendor, on_delete=models.CASCADE, default=None, null=True)
id = models.IntegerField(primary_key=True)
name = models.CharField(max_length=100)
description = models.CharField(max_length=1000)
start_date = models.DateField(auto_now_add=True)
end_date = models.DateField(auto_now_add=True)
duration = models.CharField(max_length=20, default="forever")
day_type = models.CharField(max_length=20)
time = #how do i make this field
how can i store this please let me know the best way to do this. in django models
First it isn't a good way to store the whole array what you must do is
Create New Model That will have the start and end time of Lesson for Teacher and the Schedule
class Lesson(models.Model):
start = models.DateTimeField()
end = models.DateTimeField()
duration = models.CharField(max_length=20, default="forever")
teacher = models.ForeignField(Teacher)
schedule = models.ForeignField(Schedule)
this is a better way and you can create a function that will make the array you want
# in the Schedule Class
def create_array(self):
array = []
qs = Lesson.objects.filter(schedule=self)
for lesson in qs.iterator():
lesson = [ lesson.start , lesson.end]
array.append(lesson)
return array
also you can make this function property to enhance it

Archive records and re-inserting new records in Django?

I've got a Stock table and a StockArchive table.
My Stock table consists of roughly that 10000 stocks that I update daily. The reason I have a StockArchive table is because I still wanna some historic data and not just update existing records. My question is, is this a proper way of doing it?
First, my models:
class Stock(models.Model):
objects = BulkUpdateOrCreateQuerySet.as_manager()
stock = models.CharField(max_length=200)
ticker = models.CharField(max_length=200)
exchange = models.ForeignKey(Exchange, on_delete=models.DO_NOTHING)
eod_price = models.DecimalField(max_digits=12, decimal_places=4)
currency = models.CharField(max_length=20, blank=True, null=True)
last_modified = models.DateTimeField(blank=True, null=True)
class Meta:
db_table = "stock"
class StockArchive(models.Model):
objects = BulkUpdateOrCreateQuerySet.as_manager()
stock = models.ForeignKey(Stock, on_delete=models.DO_NOTHING)
eod_price = models.DecimalField(max_digits=12, decimal_places=4)
archive_date = models.DateField()
class Meta:
db_table = "stock_archive"
I proceed on doing the following:
#transaction.atomic
def my_func():
archive_stocks = []
batch_size = 100
old_stocks = Stock.objects.all()
for stock in old_stocks:
archive_stocks.append(
StockArchive(
stock=stock.stock,
eod_price = stock.eod_price,
archive_date = date.today(),
)
)
# insert into stock archive table
StockArchive.objects.bulk_create(archive_stocks, batch_size)
# delete stock table
Stock.objects.all().delete()
# proceed to bulk_insert new stocks
I also wrapped the function with a #transaction.atomic to make sure that everything is committed and not just one of the transactions.
Is my thought process correct, or should I do something differently? Perhaps more efficient?

Add operation on same valued record in Django

I am trying to run query that sum of the same record that on the database. It is more clear to use codes instead of words.
class Track(models.Model):
title = models.CharField()
isrc = models.CharField()
...
class Playlog(models.Model):
track = models.ForeignKey(Track, on_delete=models.CASCADE)
....
On database there is multiple record which has same isrc value. To get real playlog data I needed to get total playlog count which has same isrc of all Track. I tried following query but it shows me a duplicated values of Track. if there is same isrc I wanted to get sum of all playlog of same isrc record.
Playlog.objects.values("track__isrc").annotate(Count("track__playlog", filter=Q(track__playlog__duration__gte=10))
You should add a .order_by('track__isrc') to force it to "fold". You should also count on the model, not on a related model:
Playlog.objects.values('track__isrc').annotate(
Count('pk', filter=Q(duration__gte=10)
).orer_by('track__isrc')
That being said, if the same value of isrc means it is the same product, etc. It is better to make a model and a ForeignKey to that product, for example:
class Product(models.Model):
isrc = mode.sCharField(max_length=128, unique=True)
# …
class Track(models.Model):
title = models.CharField(max_length=128)
isrc = models.ForeignKey(Product, on_delete=models.CASCADE)
# …
class Playlog(models.Model):
track = models.ForeignKey(Track, on_delete=models.CASCADE)
duration = models.IntegerField()
# …
Then you can annotate the Product, for example:
from django.db.models import Count, Q
Product.objects.annotate(
total_play=Count('track__playlog', filter=Q(track__playlog__duration__gte=10))
)

Edit django 'through' model inline in wagtail admin?

[Edited with better code sample]
As per the title I am trying to allow for inline editing for a very simple shop page in Wagtail (will probably make it into a simple package):
With the following models:
class Product(ClusterableModel):
page = ParentalKey(MiniShopPage, on_delete=models.CASCADE, related_name='shop_products')
name = models.CharField(max_length=255)
description = models.CharField(max_length=2500)
downloadable = models.BooleanField()
price = models.FloatField()
image = models.ForeignKey(
'wagtailimages.Image',
null=True,
blank=True,
on_delete=models.SET_NULL,
related_name='+'
)
# define the content_panels
panels = [
FieldPanel('name'),
FieldPanel('description'),
FieldPanel('downloadable'),
FieldPanel('price'),
ImageChooserPanel('image'),
]
class Order(TimeStampedModel, ClusterableModel):
'''
Example of use outside of the admin:
p = Product.objects.first()
order = Order.objects.create(client_email='someone#hotmail.com', gift_amount=0)
quantities = ProductInOrderCount(product=p, order=order, quantity=2)
quantities.save()
for itm in order.productinordercount_set.all():
print(itm.quantity)
'''
is_fulfilled = models.BooleanField(default=False)
is_paid_for = models.BooleanField(default=False)
client_email = models.EmailField(blank=False)
gift_amount = models.PositiveIntegerField()
# products = M2MTHROUGH
# the through model stores the quantity
products = models.ManyToManyField(Product, through='ProductInOrderCount')
content_panels = [
FieldPanel('is_fulfilled'),
FieldPanel('is_paid_for'),
FieldPanel('client_email'),
FieldPanel('gift_amount'),
InlinePanel('products'),
]
class OrderModelAdmin(ModelAdmin):
model = Order
menu_label = 'Orders'
...
modeladmin_register(OrderModelAdmin)
class ProductInOrderCount(Orderable):
product = models.ForeignKey(Product, on_delete=models.CASCADE)
order = models.ForeignKey(Order, on_delete=models.CASCADE)
quantity = models.PositiveIntegerField()
The tricky thing is that I get the error Cannot set values on a ManyToManyField which specifies an intermediary model. Or I simply don't get an inline panel, but rather a select.
I am assuming this is the case because the create and add methods do not work on through models, is that the case?
If so could you suggest a way I can rewrite the app so as to allow me to create orders with products in the admin and in my code?
InlinePanel only works with one-to-many ParentalKey relations, not a ManyToManyField. That shouldn't be a problem, because ParentalKey is a good fit for this case:
A ManyToManyField with a through model is really just two one-to-many relations back to back;
ParentalKey is designed for relations that are closely tied to the parent model, in the sense that they're always edited, validated and saved as a single unit. This is true for the relation between ProductInOrderCount and Order (a ProductInOrderCount record is conceptually part of an Order), but not the relation between ProductInOrderCount and Product (a ProductInOrderCount is not part of a Product).
This would give you a model definition like:
class ProductInOrderCount(Orderable):
order = ParentalKey(Order, on_delete=models.CASCADE, related_name='ordered_products')
product = models.ForeignKey(Product, on_delete=models.CASCADE)
quantity = models.PositiveIntegerField()
Your Order model can then have an InlinePanel('ordered_products'), and the products field can be omitted.