Why Django get result list from query_set too late?

Why Django get result list from query_set too late? - django

I am studying about Django ORM. I couldn't get an answer from the search, but I'd appreciate it if someone could tell me the related site.
My model is as follows. user1 has2 accounts, and 500,000 transactions belong to one of the accounts.
class Account(models.Model):
class Meta:
db_table = 'account'
ordering = ['created_at']
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
account = models.CharField(max_length=20, null=False, blank=False, primary_key=True)
balance = models.PositiveBigIntegerField(default=0)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class AccountTransaction(models.Model):
class Meta:
db_table = 'account_transaction'
ordering = ['tran_time']
indexes = [
models.Index(fields=['tran_type', 'tran_time', ]),
]
account = models.ForeignKey(Account, on_delete=models.CASCADE)
tran_amt = models.PositiveBigIntegerField()
balance = models.PositiveBigIntegerField()
tran_type = models.CharField(max_length=10, null=False, blank=False)
tran_detail = models.CharField(max_length=100, null=True, default="")
tran_time = models.DateTimeField(auto_now_add=True)
The query time for the above model is as follows.
start = time.time()
rs = request.user.account_set.all().get(account="0000000010").accounttransaction_set.all()
count = rs.count()
print('>>all')
print(time.time() - start) # 0.028000831604003906
start = time.time()
q = Q(tran_time__date__range = ("2000-01-01", "2000-01-03"))
rs = request.user.account_set.all().get(account="0000000010").accounttransaction_set.filter(q)
print('>>filter')
print(time.time() - start) # 0.0019981861114501953
start = time.time()
result = list(rs)
print('>>offset')
print(time.time() - start) # 5.4373579025268555
The result of the query_set is about 3500 in total. (3500 out of 500,000 records were selected).
I've done a number of things, such as setting offset to the result (rs) of query_set, but it still takes a long time to get the actual value from query_set.
I know that the view loads data when approaching actual values such as count(), but what did I do wrong?

From https://docs.djangoproject.com/en/4.1/topics/db/queries/#querysets-are-lazy:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
q = Entry.objects.filter(headline__startswith="What")
q = q.filter(pub_date__lte=datetime.date.today())
q = q.exclude(body_text__icontains="food")
print(q)
Though this looks like three database hits, in fact it hits the
database only once, at the last line (print(q)). In general, the
results of a QuerySet aren’t fetched from the database until you “ask”
for them. When you do, the QuerySet is evaluated by accessing the
database. For more details on exactly when evaluation takes place, see
When QuerySets are evaluated.
In your example the database is hit only when you're calling list(rs), that's why it takes so long.

Related

DJango ORM double join with Sum

I searched for a similar case on SO and Google with no luck.
SHORT EXPLANATION
I have transactions that belong to an account, and an account belongs to an account aggrupation.
I want to get a list of accounts aggrupations, with their accounts, and I want to know the total balance of each account (an account balance is calculated by adding all its transactions amount).
LONG EXPLANATION
I have the following models (I include mixins for the sake of completeness):
class UniqueNameMixin(models.Model):
class Meta:
abstract = True
name = models.CharField(verbose_name=_('name'), max_length=100, unique=True)
def __str__(self):
return self.name
class PercentageMixin(UniqueNameMixin):
class Meta:
abstract = True
_validators = [MinValueValidator(0), MaxValueValidator(100)]
current_percentage = models.DecimalField(max_digits=5,
decimal_places=2,
validators=_validators,
null=True,
blank=True)
ideal_percentage = models.DecimalField(max_digits=5,
decimal_places=2,
validators=_validators,
null=True,
blank=True)
class AccountsAggrupation(PercentageMixin):
pass
class Account(PercentageMixin):
aggrupation = models.ForeignKey(AccountsAggrupation, models.PROTECT)
class Transaction(models.Model):
date = models.DateField()
concept = models.ForeignKey(Concept, models.PROTECT, blank=True, null=True)
amount = models.DecimalField(max_digits=10, decimal_places=2)
account = models.ForeignKey(Account, models.PROTECT)
detail = models.CharField(max_length=100, blank=True, null=True)
def __str__(self):
return '{} - {} - {} - {}'.format(self.date, self.concept, self.amount, self.account)
I want to be able to do this in Django ORM:
select ca.*, ca2.*, sum(ct.amount)
from core_accountsaggrupation ca
join core_account ca2 on ca2.aggrupation_id = ca.id
join core_transaction ct on ct.account_id = ca2.id
group by ca2.name
order by ca.name;

It would appear that nesting navigation through sets is not possible:
Wrong: AccountsAggrupation.objects.prefetch_related('account_set__transaction_set')
(or any similar approach). The way to work with this is the way around: go from transaction to account and then to account_aggroupation.
But, as I needed to have a dict with account_aggroupation, pointing each key to its set of accounts (and the balance for each), I ended up doing this:
def get_accounts_aggrupations_data(self):
accounts_aggrupations_data = {}
accounts_balances = Account.objects.annotate(balance=Sum('transaction__amount'))
for aggrupation in self.queryset:
aggrupations_accounts = accounts_balances.filter(aggrupation__id=aggrupation.id)
aggrupation.balance = aggrupations_accounts.aggregate(Sum('balance'))['balance__sum']
accounts_aggrupations_data[aggrupation] = aggrupations_accounts
current_month = datetime.today().replace(day=1).date()
date = current_month.strftime('%B %Y')
total_balance = Transaction.objects.aggregate(Sum('amount'))['amount__sum']
return {'balances': accounts_aggrupations_data, 'date': date, 'total_balance': total_balance}
Note that since I'm iterating through the accounts_aggrupations, that query (self.queryset, which leads to AccountsAggrupation.objects.all()) is executed to the DB.
The rest of the queries I do, do not execute yet because I'm not iterating through them (until consuming the info at the template).
Also note that the dictionary accounts_aggrupations_data has an accounts_aggrupation object as key.

Django order by issue when used Q filter with multiple order by and distinct

I am working on a photos project where a user can Download or Like a photo (do other operations as well) . I have two models to track this information. Below are the models used (Postgres the database used).
# Photo model stores photos
# download_count, like_count is stored in the same model as well for easier querying
class Photo(models.Model):
name = models.CharField(max_length=100, null=True, blank=True)
image = models.ForeignKey(Image, null=True, on_delete=models.CASCADE)
download_count = models.IntegerField(default=0)
like_count = models.IntegerField(default=0)
views = GenericRelation(
'Stat', related_name='photo_view',
related_query_name='photo_view', null=True, blank=True)
downloads = GenericRelation(
'Stat', related_name='photo_download',
related_query_name='photo_download', null=True, blank=True)
# Stat has generic relationship with photos. So that it can store any stats information
class Stat(models.Model):
VIEW = 'V'
DOWNLOAD = 'D'
LIKE = 'L'
STAT_TYPE = (
(VIEW, 'View'),
(DOWNLOAD, 'Download'),
(LIKE, 'Like'),
)
user = models.ForeignKey(
User, null=True, blank=True, on_delete=models.SET_NULL)
content_type = models.ForeignKey(
ContentType, on_delete=models.CASCADE, default=0)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey()
stat_type = models.CharField(max_length=2, choices=STAT_TYPE, default=VIEW)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
My requirement is fetch the Photos that are popular this week. The popularity score should consider the likes count, download count.
I had written below query to get the popular photos this week which checks the likes or downloads created this week.
# week number
current_week = date.today().isocalendar()[1]
photos = Photo.objects.filter(Q(likes__created_at__week=current_week) | Q(downloads__created_at__week=current_week))\
.order_by('id', 'download_count', 'like_count')\
.distinct('id')
Problem: With the above query, the result set is always ordered by id even though other fields are mentioned.
Requirement: The photos should be ordered by sum of total likes and downloads so that they will be sorted by popularity.
Please suggest me a way to achieve this considering the database performance.
Thank You

You can use annotate() and F object for such cases:
photos = Photo.objects.filter().annotate(like_download=F('download_count') + F('like_count')).order_by('like_download').distinct()

Entry.objects.filter(pub_date__year=2005).order_by('-pub_date', 'headline')
The result above will be ordered by pub_date descending, then by headline ascending. The negative sign in front of "-pub_date" indicates descending order.
So if you give id, it will be ordered by id first.
You can use annotate in your case.
Per-object summaries can be generated using the annotate() clause. When an annotate() clause is specified, each object in the QuerySet will be annotated with the specified values.

Thanks to #biplove-lamichhane for suggesting me annotate function. I could able to achive the desired response by using the below query
photos = Photo.objects.filter(is_active=True)\
.filter(Q(likes__created_at__week=current_week) | Q(downloads__created_at__week=current_week))\
.annotate(score=F('download_count') + F('like_count'))\
.order_by('-score')\
.distinct()

Django ORM exclude fails

I have some problems with my query - with filter() it's ok but with exclude() doesn't work.
My models:
class Dictionary(DateTimeModel):
base_word = models.ForeignKey(BaseDictionary, related_name=_('dict_words'))
word = models.CharField(max_length=64)
version = models.ForeignKey(Version)
class FrequencyData(DateTimeModel):
word = models.ForeignKey(Dictionary, related_name=_('frequency_data'))
count = models.BigIntegerField(null=True, blank=True)
source = models.ForeignKey(Source, related_name=_('frequency_data'), null=True, blank=True)
user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name=_('frequency_data'))
user_ip_address = models.GenericIPAddressField(null=True, blank=True)
date_of_checking = models.DateTimeField(null=True, blank=True)
is_checked = models.BooleanField(default=False)
And I want to get some words from Dictionary where whose frequencies are not from some user
Dictionary.objects.prefetch_related('frequency_data').filter(frequency_data__user=1)[:100] - OK
Dictionary.objects.prefetch_related('frequency_data').exclude(frequency_data__user=1)[:100] - processor up to 100% and loading
Without prefetch_related the same. What is wrong with exclude?
EDIT
Dictionary db tabel - 120k rows
FrequencyData - 160k rows
EDIT2
psql(9.6.6)

Django query: Can you nest Annotations?

I have a Django query where I want to group the number of test attempts by test_id and get an average among each test.
The test_attempts table logs each test attempt a user makes on a given test. I want to find the average number of attempts per test
Here is my query:
average = TestAttempts.objects.values('test_id').annotate(Avg(Count('test_id'))).filter(user_id=id)
I am getting the following error:
'Count' object has no attribute 'split'
Is there a way to handle this without having to write raw SQL?
UPDATE:
Here is the TestAttemt model
class TestAttempts(models.Model):
id = models.IntegerField(primary_key=True)
user_id = models.IntegerField()
test_id = models.IntegerField()
test_grade = models.DecimalField(max_digits=6, decimal_places=1)
grade_date_time = models.DateTimeField()
start_time = models.DateTimeField()
seconds_taken = models.IntegerField()
taking_for_ce_credit = models.IntegerField()
ip_address = models.CharField(max_length=25L)
grade_points = models.DecimalField(null=True, max_digits=4, decimal_places=1, blank=True)
passing_percentage = models.IntegerField(null=True, blank=True)
passed = models.IntegerField()
class Meta:
db_table = 'test_attempts'

You want a single number, an average number of attempts over all tests? Do you have a Test model?
This will work then:
average = (Test.objects.filter(testattempt__user_id=id)
.annotate(c=Count('testattempt'))
.aggregate(a=Avg('c'))['a'])
If you don't have a TestAttempt → Test relationship, but only a test_id field, then this should work:
average = (TestAttempt.objects.filter(user_id=2)
.values('test_id')
.annotate(c=Count('pk'))
.aggregate(a=Avg('c')))
but doesn't work for me on sqlite, and I don't have a proper db at hand.

Django .count() on ManyToMany has become very slow

I have a Django project that consists of a scraper of our inventory, run on the server as a cronjob every few hours, and the Django Admin page - which we use to view / access all items.
We have about 30 items that are indexed.
So each 'Scraping Operation' consists of about 30 individual 'Search Operations' each of which get around 500 results per run.
Now, this description is a bit confusing, so I've included the models below.
class ScrapingOperation(models.Model):
date_started = models.DateTimeField(default=timezone.now, editable=True)
date_completed = models.DateTimeField(blank=True, null=True)
completed = models.BooleanField(default=False)
round = models.IntegerField(default=-1)
trusted = models.BooleanField(default=True)
class Search(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE)
date_started = models.DateTimeField(default=timezone.now, editable=True)
date_completed = models.DateTimeField(blank=True, null=True)
completed = models.BooleanField(default=False)
round = models.IntegerField(default=1)
scraping_operation = models.ForeignKey(ScrapingOperation, on_delete=models.CASCADE, related_name='searches')
trusted = models.BooleanField(default=True)
def total_ads(self):
return self.ads.count()
class Ad(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE, related_name='ads')
title = models.CharField(max_length=500)
price = models.DecimalField(max_digits=8, decimal_places=2, null=True)
first_seen = models.DateTimeField(default=timezone.now, editable=True)
last_seen = models.DateTimeField(default=timezone.now, editable=True)
def __str__(self):
return self.title
Now here is the problem we've run into.
On the admin pages for both the Search model and the SeachOperation model we would like to see the amount of ads scraped for that particular object (represented as a number) This works fine four our seachers, but our implementation for the SearchOperation has run into problems
This is the code that we use:
class ScrapingOperationAdmin(admin.ModelAdmin):
list_display = ['id', 'completed', 'trusted', 'date_started', 'date_completed', 'number_of_ads']
list_filter = ('completed', 'trusted')
view_on_site = False
inlines = [
SearchInlineAdmin,
]
def number_of_ads(self, instance):
total_ads = 0
for search in instance.searches.all():
total_ads += search.ads.count()
return total_ads
The problem that we have run into is this: The code works and provides the correct number, however, after +/- 10 ScrapingOperation we noticed that the site started to slow done when loading the page. We are now up to 60 ScrapingOperations and when we click the ScrapingOperations page in the Django admin it takes almost a minute to load.
Is there a more efficient way to do this? We thought about saving the total number of ads to the model itself, but it seems wasteful to dedicate a field to information that should be accessible with a simple .count() call. Yet our query is evidently so inefficient that the entire site locks down for almost a minute when it is executed. Does anyone have an idea of what we are doing wrong?
Based on the comments below I am currently working on the following solution:
def number_of_ads(self, instance):
total_ads = 0
searches = Search.objects.filter(scraping_operation=instance).annotate(Count('ads'))
for search in searches:
total_ads += search.ads__count
return total_ads

Use an annotation when getting the queryset
from django.db.models import Count
class ScrapingOperationAdmin(admin.ModelAdmin):
...
def get_queryset(self, request):
qs = super().get_queryset(request)
qs.annotate(number_of_ads=Count('searches__ads')
return qs

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why Django get result list from query_set too late? - django

Related

DJango ORM double join with Sum

Django order by issue when used Q filter with multiple order by and distinct

Django ORM exclude fails

Django query: Can you nest Annotations?

Django .count() on ManyToMany has become very slow

Categories

Resources