What is the best way to achieve low load on the database or application server for this use case:
Let's say I want to build a web application that has for each user an overview page. The overview page shows in an aggregated form for each user the user's data. For example, if it were a library application it would show how many times the user visited the library in total, how many books he read in total, how many books were delivered delayed in total, how many minutes he spend in the building. Each time the user visits the overview page the up-to-date values should be displayed. While the user interacts with the site the numbers change.
What I could do is for every overview page refresh do several counts in the database. But that would be expensive.
views.py
def overview(request, userID):
booksCount = Book.objects.count()
booksReadCount = Book.objects.filter(UserID=userID, Status='read').count()
# ... many more, same way
libraryVisitedCount = LibraryVisits.objects.filter(UserID=userID).count()
# many counts like these on different tables for the user
data = {
"booksCount" : booksCount,
"booksReadCount" : booksReadCount,
# ... many more
"libraryVisitedCount" : libraryVisitedCount
}
render(..., context=data)
I have thought I could store a JSON object with the data to be presented on the overview page in a database table and I update the JSON each time an event happend on the site which affects the count of objects.
Or I could use a materiliazed view but to refresh it I would have to recalculate all the data of all users each time, right?
Other ideas? I'm using django webframework and postgres database.
TL;DR: I wondered isn't there a better way to receive counts than do several counts in the database each time?
Thanks.
Lets say, in Book, LibraryVisit etc models, there is ForeignKey to User model with related_name like this:
class Book(models.Model):
UserID = models.ForeignKey(User, related_name='books', on_delete=DO_NOTHING)
class LibraryVisit(models.Model):
UserID = models.ForeignKey(User, related_name='library_visit', on_delete=DO_NOTHING)
Then you can use annotation and conditional expression like this:
from django.db.models import Case, IntegerField, Sum, When
def overview(request, userID):
users = User.objects.filter(pk=userId)
users = users.annotate(
booksReadCount=Sum(
Case(
When(book__Status='read', then=1),
output_field=IntegerField()
)
)
).annotate(library_visited_count=Count('library_visit'))
# FYI: please use snake_case when defining object attribute(like model fields) as per PEP-8 style guide
data = {
"user_object" : users.first(), # taking first item of the User queryset. Also DB is hit once in this step
"booksCount" : Book.objects.count()
}
# access counts in view like this:
# user.book_read_count
# user.library_visited_count
return render(..., context=data)
# bold marked words are related_name
And render counts in template like this:
{{ user_object.book_read_count }}
{{ user_object.library_visited_count }}
Related
I am having trouble writing a query using Django ORM, I want to find the latest record in each group. I am putting chat messages in the model and I want to find the latest chat of each user and show chats latest chat of each user and with the latest user's chat on the home screen just like in WhatsApp, Skype or similar apps. Currently, I am using the following query,
Chats.objects.all().order_by('user_id', '-date').distinct('user_id')
Using this I am able to get the latest chat of each user but I am not able to get the sequence correct. The result of the query is in the order of which the users were created in the database which I understand is correct, but I want to show the user who sent the latest chat at the top.
My Models.py
class Chats(models.Model):
user_id = models.ForeignKey(User, on_delete=models.CASCADE)
chat = models.CharField(max_length=1023, null=True, blank=True)
date = models.DateTimeField(auto_now_add=True)
Thank you so much, Please let me know if any other information is required.
Option 1: Order on the Django/Python layer
The items are first sorted by user_id, and only in case of a tie, it takes the one with the latest date. But that means that you eventually get for each user a Chats object, ordered by the user_id.
I think here your only option is to sort it at the Django/Python level, so wrap it into a list, and sort by the date:
from operator import attrgetter
items = list(Chats.objects.order_by('user_id', '-date').distinct('user_id'))
items.sort(key=attrgetter('date'), reverse=True)
# work with items
and then render the items in the template.
Option 2: Annotate the User model instead
Another option is to annotate the User model and thus work with a QuerySet of User objects:
from django.db.models import Max, OuterRef, Subquery
User.objects.filter(
chats__isnull=False
).annotate(
last_date=Max('chats__date'),
last_message=Subquery(
Chat.objects.filter(user_id=OuterRef('pk')).order_by('-date').value('chat')[:1]
)
).order_by('-last_date')
Here the User objects will have an extra attribute .last_date with the latest date time of the object, and .last_message with that message.
Note: It is normally better to make use of the settings.AUTH_USER_MODEL [Django-doc] to refer to the user model, than to use the User model [Django-doc] directly. For more information you can see the referencing the User model section of the documentation.
I have large table of data (~30 Mb) that I converted into into a model in Django. Now I want to have access to that data through a REST API.
I've successfully installed the Django REST framework, but I'm looking for a way to automatically create a URL for each field in my model. My model has about 100 fields, and each field has about 100,000 entries.
If my model is named Sample,
models.py
class Sample(models.Model):
index = models.IntegerField(primary_key=True)
year = models.IntegerField(blank=True, null=True)
name = models.TextField(blank=True, null=True)
...97 more fields...
then I can access the whole model using Django REST framework like this:
urls.py
class SampleSerializer(serializers.HyperlinkedModelSerializer):
class Meta:
model = Sample
fields = ( **100 fields**)
class SampleViewSet(viewsets.ModelViewSet):
queryset = Sample.objects.all()
serializer_class = SampleSerializer
router = routers.DefaultRouter()
router.register(r'sample', SampleViewSet)
But of course my browser can't load all of that data in a reasonable amount of time. I could manually make a different class and URL for each field, but there must be a better way... I want to be able to go to my_site.com/sample/year (for example) and have it list all of the years in JSON format, or my_site.com/sample/name and list all the names, etc.
Please help me figure out how to do this, thanks!
You might be able to do that using a custom viewset route.
You have this:
class ModelViewSet(ModelViewSet):
#list_route()
def sample_field(self, request):
desired_field = request.data.get('field', None)
if not desired_field:
return response # pseudocode
values = Model.objects.all().values_list(desired_field, flat=True)
# serialize this for returning the response
return Response(json.dumps(values)) # this is an example, you might want to do something mode involved
You will be able to get this from the url:
/api/model/sample_field/?field=foo
This extra method on the viewset will create a new endpoint under the samples endpoint. Since it's a list_route, you can reach it using /sample_field.
So following your code, it would be:
mysite.com/sample/sample_field/?field='year'
for example.
There are many interesting details in your question, but with this sample I think you might able to achieve what you want.
Try to use pagination. You can do it in almost the same way as in you question. Pagination in django lets you divide the results into pages. You don't have to display all the entries in the same page. I think this is the best option for you.
Refer django documentation on pagination:
Pagination in django
Have found related answers, but cannot find anything covering my specific need. I have only being using Django for about 2 weeks. Have tried tutorial and the Django documentation.
Background. I have a database with different funds. Each fund can have different performance
periods. In each performance period there can be different series. Each Performance period can
have flows for all the series defined within it.
Have set up relational db so that it filters down like that. Problem is that
series names get re-used across performance periods (there is a seperate unique key).
I have models file that looks something like this:
class SeriesFlow(models.Model):
series= models.ForeignKey(Series)
date = models.DateField('date for flow')
value = models.FloatField('Flow pos inflow, neg outflow')
def __unicode__(self):
return str(self.series)
class Series(models.Model):
perf_period = models.ForeignKey(PerformancePeriod)
series = models.CharField(max_length=100)
series_longname = models.CharField(max_length=200, blank=True)
# more fields
def __unicode__(self):
return self.series
In the admin.py file i do the following relevant things:
class SeriesFlowAdmin(admin.ModelAdmin):
fields = ['series', 'date', 'value']
list_display = ['series', 'date', 'value']
list_filter = ['series__perf_period'] #nice __ syntax to go backwards
# and then registering the admin interfaces
admin.site.register(Fund, FundAdmin)
admin.site.register(PerformancePeriod, PerformancePeriodAdmin)
admin.site.register(Profit, ProfitAdmin)
admin.site.register(Series, SeriesAdmin)
admin.site.register(SeriesFlow, SeriesFlowAdmin)
The admin form allows my to filter series flows by the performance period which
is what I wqant to do. when i try to add a series flow I get the three fields
that i want to enter series, date, value. problem is, is that the dropdown
gives the options for all the series in the database. I want to filter the
dropdown for series flow entering page so that it only gives the relevant series.
the series names displayed get reused between different funds and performance periods
so the dropdown is a mess! The filtered performance period is in the url of
the form so it is defiantely available. just cant figure out how to filter for it.
URL for the series flow filtered and flow entry forms are:
admin/fee/seriesflow/?series__perf_period__id__exact=3
admin/fee/seriesflow/add/?_changelist_filters=series__perf_period__id__exact%3D3
I the filtering is definitely still available. now want to make sure that just the relevant series are displayed. I add screen cap showing that series from other performance periods are also displayed in drop-down.
Don't really understand your model relationship but formfield_for_foreignkey is what you need i think
class SeriesFlowAdmin(admin.ModelAdmin):
def formfield_for_foreignkey(self, db_field, request, **kwargs):
if db_field.name == 'series': kwargs['queryset'] = Series.objects.filter(series='xxx')
I'm working on a web project with Django and MongoDB as my database (using MongoEngine to connect them).
I have to create Celery task to cleanup old user accounts. I need to only clean lazy users accounts without any content after one month (lazy users are automatically created users when they first connect to the website). What count as content? Any posts from the user or comments on any of the posts.
I did it like this, but I want to transform this into query if possible:
def clean_inactive_lazy_users():
users_with_content = []
for post in api_models.Post.objects:
users_with_content.append(post.author)
for comment in post.comments:
users_with_content.append(comment.author)
users_with_content = list(set(users_with_content))
for user in account_models.User.objects:
if not user.is_authenticated() and (timezone.now() - user.connection_last_unsubscribe).days >= settings.DELETE_LAZY_USER_AFTER_DAYS and user not in users_with_content:
user.delete()
The models look like this:
base.py
class AuthoredEmbeddedDocument(mongoengine.EmbeddedDocument):
author = mongoengine.ReferenceField(models.User, required=True)
class AuthoredDocument(mongoengine.Document):
author = mongoengine.ReferenceField(models.User, required=True)
api_models:
from . import base
class Comment(base.AuthoredEmbeddedDocument):
"""
This class defines document type for comments on posts.
"""
class Post(base.AuthoredDocument):
"""
This class defines document type for posts.
"""
account_models:
class User(auth.User):
def is_authenticated(self):
return self.has_usable_password()
Hopefully I provided enough information so you can help me with the problem. Thanks!
I think there are a couple ways for you to clean this up.
You could get all author unique ids of posts with something like:
user_ids_with_posts_list = Posts.objects.scalar('author.id', flat=True).distinct('author.id')
scalar should give you a list of author ids instead of document objects and distinct
should make sure they are unique. This pushes what you are doing in python to mongodb
You could then construct a query for Users. You would hvae to change your days ago into a date. What condition is has_usable_password checking for?
start_time = timezone.now() - timedelta(days=DAYS_AGO_CONSTANT)
invalid_users = User.objects.filter(connection_last_unsubscribe__lte=start_time,
password__isnull=True).exclude(pk__in=user_ids_with_posts_list)
invalid_users.delete()
I have a Model, and want every User of the system has a table reserved for himself, respecting this Model.
To make it clear:
Imagine the Model "Games".
I do not want that there is only one table "games", but there is:
foo_games, bar_games (foo / bar are users of the system)
How to do this ?
edit:
why ?
Imagine I have 1000 users, and each User has 100 games.
Do you think you have a table with 1000 * 100 items is better than having 1000 tables with 100 items each?
The way this is typically handled in with the Django ORM is by linking the two models together (tables) with a Foreign Key. You can then get just the records that apply to a user by using the .filter() method. In this way it will seem like each user has their own table. For example...
from django.contrib.auth.models import User
from django.db import models
class Game(models.Model):
name = models.CharField(max_length=50)
owner = models.ForeignKey(User)
The ForeignKey field here provides a "link" that relates 1 Game record to a specific User.
When you want to retrieve the Games that apply just to 1 user, you can do so like this:
# Select whichever user you want to (any of these work)
user = User.objects.get(username='admin')
user = User.objects.get(id=64)
user = request.user
# Then filter by that user
user_games = Game.objects.filter(owner=user)
Edit --
To answer your question about more rows vs. more tables: Relational database servers are optimized to have a huge row capacity within a single table. With your example query, 1000 * 100 is only 100,000 records, which is probably only 0.01% of what a table can theoretically hold (server memory and storage aside).