Django project architecture advice - django

I have a django project and I have a Post model witch look like that:
class BasicPost(models.Model):
author = models.ForeignKey('auth.User', on_delete=models.CASCADE)
published = models.BooleanField(default=False)
created_date = models.DateTimeField(auto_now_add=True)
title = models.CharField(max_length=100, blank=False)
body = models.TextField(max_length=999)
media = models.ImageField(blank=True)
def get_absolute_url(self):
return reverse('basic_post', args=[str(self.pk)])
def __str__(self):
return self.title
Also, I use the basic User model that comes with the basic django app.
I want to save witch posts each user has read so I can send him posts he haven't read.
My question is what is the best way to do so, If I use Many to Many field, should I put it on the User model and save all the posts he read or should I do it in the other direction, put the Many to Many field in the Post model and save for each post witch user read it?
it's going to be more that 1 million + posts in the Post model and about 50,000 users and I want to do the best filters to return unread posts to the user
If I should use the first option, how do I expand the User model?
thanks!

On your first question (which way to go): I believe that ManyToMany by default creates indices in the DB for both foreign keys. Therefore, wherever you put the relation, in User or in BasicPost, you'll have the direct and reverse relationships working through an index. Django will create for you a pivot table with three columns like: (id, user_id, basic_post_id). Every access to this table will index through user_id or basic_post_id and check that there's a unique couple (user_id, basic_post_id), if any. So it's more within your application that you'll decide whether you filter from a 1 million set or from a 50k posts.
On your second question (how to overload User), it's generally recommended to subclass User from the very beginning. If that's too late and your project is too far advanced for that, you can do this in your models.py:
class BasicPost(models.Model):
# your code
readers = models.ManyToManyField(to='User', related_name="posts_already_read")
# "manually" add method to User class
def _unread_posts(user):
return BasicPost.objects.exclude(readers__in=user)
User.unread_posts = _unread_posts
Haven't run this code though! Hope this helps.

Could you have a separate ReadPost model instead of a potentially large m2m, which you could save when a user reads a post? That way you can just query the ReadPost models to get the data, instead of storing it all in the blog post.
Maybe something like this:
from django.utils import timezone
class UserReadPost(models.Model):
user = models.ForeignKey("auth.User", on_delete=models.CASCADE, related_name="read_posts")
seen_at = models.DateTimeField(default=timezone.now)
post = models.ForeignKey(BasicPost, on_delete=models.CASCADE, related_name="read_by_users")
You could add a unique_together constraint to make sure that only one UserReadPost object is created for each user and post (to make sure you don't count any twice), and use get_or_create() when creating new records.
Then finding the posts a user has read is:
posts = UserReadPost.objects.filter(user=current_user).values_list("post", flat=True)
This could also be extended relatively easily. For example, if your BasicPost objects can be edited, you could add an updated_at field to the post. Then you could compare the seen_at of the UserReadPost field to the updated_at field of the BasicPost to check if they've seen the updated version.
Downside is you'd be creating a lot of rows in the DB for this table.

If you place your posts in chronological order (by created_at, for example), your option could be to extend user model with latest_read_post_id field.
This case:
class BasicPost(models.Model):
# your code
def is_read_by(self, user):
return self.id < user.latest_read_post_id

Related

How to get the first record of a 1-N relationship from the main table with Django ORM?

I have a Users table which is FK to a table called Post. How can I get only the last Post that the user registered? The intention is to return a list of users with the last registered post, but when obtaining the users, if the user has 3 posts, the user is repeated 3 times. I'm interested in only having the user once. Is there an alternative that is not unique?
class User(models.Model):
name = models.CharField(max_length=50)
class Post(models.Model):
title = models.CharField(max_length=50)
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='posts', related_query_name='posts')
created = models.DateTimeField(default=timezone.now)
class Meta:
get_latest_by = 'created'
ordering = ['-created']`
I already tried with selected_related and prefetch_related, I keep getting multiple user registrations when they have multiple Posts.
user = User.objects.select_related('posts').all().values_list('id', 'name', 'posts__title', 'posts__created')
This does give me the answer I want, but when I change the created field to sort by date, I don't get the newest record, I always get the oldest.
user = User.objects.select_related('posts').all().values_list('id', 'name', 'posts__title', 'posts__created').distinct('id')
I'm trying to do it without resorting to doing a record-by-record for and getting the most recent Post. I know that this is an alternative but I'm trying to find a way to do it directly with the Django ORM, since there are thousands of records and a for is less than optimal.
In that case your Django ORM query would first filter posts by user then order by created in descending order and get the first element of the queryset.
last_user_post = Post.objects.filter(user__id=1).order_by('-created').first()
Alternatively, you can use an user instance:
user = User.objects.get(id=1)
last_user_post = Post.objects.filter(user=user).order_by('-created').first()

dynamiclly count vs database record

I have a post model as below, now I use number_of_likes to record the liked post number. If so, I have to manually maintain the number_of_likes field.
Now, I add this field in post mainly two reasons, and I would like to hear your advice.
it is easy to write serialisation using declarative syntax(every post need this)
I don't need to filter and count on model Like, which is more expensive than just get this value from field
class Post(models.Model):
...
number_of_likes = models.IntegerField()
class Like(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
post = models.ForeignKey(Post, on_delete=models.CASCADE)
I would like to know which method is better, using Like.objects.filter(user=user).count() or maintain a new field such as number_of_likes.If choose later, what is the best way to maintain this field
As #WillemVanOnsem suggested, best way to display this data is by annotation. For example:
from django.db.models import Count
posts = Post.objects.annotate(num_of_likes=Count('like'))
# usage
for post in posts:
print(post.num_of_likes)
# or
posts.values('pk', 'num_of_likes')

Django change multiple model entries

I have a model containing various entries tied to one user and I want to give the user a view where he can review these entries, select some of them and perform an action on the selection. something like the admin intereface has. I have tried UpdateView but that is for one entry only. ListView doesn't like that the model returns multiple entries for one identificator. Is there something else I could use?
EDIT:
Below is the model, I am talking about. A user will have multiple model entries and I just want a view that lists these multiple entries and allows the user to perform a bulk action on them, like delete ...
class UserData(models.Model):
class Meta:
app_label = "app"
user_id = models.IntegerField()
name = models.CharField(_("Name"),max_length=100)
latdeg = models.IntegerField(_('Latitude'))
latmin= models.IntegerField(_('Latitude'), validators=[validate_60])
londeg = models.IntegerField(_('Longitude'))
lonmin= models.IntegerField(_('Longitude'), validators=[validate_60])
main = models.BooleanField()
def __unicode__(self):
return user_id + "-" + self.name
I think what you are looking for is inlineformset_factory
Since you have not given any example, I suggest you look at the example of One author, multiple books as given in this SO post.

Optimizing Django queryset related comparisons

I have a Django app where users upload photos, and leave comments under them. The data models to reflect these objects are Photo and PhotoComment respectively.
There's a third data model called PhotoThreadSubscription. Whenever a user comments under a photo, the user is subscribed to that particular thread via creating an object in PhotoThreadSubscription. This way, he/she can be apprised of comments left in the same thread by other users subsequently.
class PhotoThreadSubscription(models.Model):
viewer = models.ForeignKey(User)
viewed_at = models.DateTimeField(db_index=True)
which_photo = models.ForeignKey(Photo)
Every time a user comments under a photo, I update the viewed_at attribute of the user's PhotoThreadSubscription object for that particular photo. Any comments by other users that have a submission time of greater than viewed_at for that particular thread are therefore new.
Suppose I have a queryset of comments, all belonging to unique photos that never repeat. I want to traverse through this queryset and find the latest unseen comment.
Currently, I'm trying this in a very DB heavy way:
latest_unseen_comment = PhotoComment(id=1) #i.e. a very old comment
for comment in comments:
if comment.submitted_on > PhotoThreadSubscription.objects.get(viewer=user, which_photo_id=comment.which_photo_id).viewed_at and comment.submitted_on > latest_unseen_comment.submitted_on:
latest_unseen_comment = comment
This is obviously not a good way to do it. For one, I don't want to do DB calls in a for loop. How do I manage the above in one call? Specifically, how do I get the relevant PhotoThreadSubscription queryset in one call, and next, how do I use that to calculate the max_unseen_comment? I'm highly confused right now.
class Photo(models.Model):
owner = models.ForeignKey(User)
image_file = models.ImageField(upload_to=upload_photo_to_location, storage=OverwriteStorage())
upload_time = models.DateTimeField(auto_now_add=True, db_index=True)
latest_comment = models.ForeignKey(blank=True, null=True, on_delete=models.CASCADE)
class PhotoComment(models.Model):
which_photo = models.ForeignKey(Photo)
text = models.TextField(validators=[MaxLengthValidator(250)])
submitted_by = models.ForeignKey(User)
submitted_on = models.DateTimeField(auto_now_add=True)
Please ask for clarification if the question seemed hazy.
I think this will do it in a single query:
latest_unseen_comment = (
comments.filter(which_photo__photothreadsubscription__viewer=user,
which_photo__photothreadsubscription__viewed_at__lt=F("submitted_on"))
.order_by("-submitted_on")
.first()
)
The key here is using F expressions so that the comparison can be done with each comment's individual date, rather than using a single date hardcoded in the query. After filtering the queryset to only include the comments that are unseen, we then order_by the date of the comment and take the first one.

Filter M2M in template?

In my model, I have the following M2M field
class FamilyMember(AbstractUser):
...
email_list = models.ManyToManyField('EmailList', verbose_name="Email Lists", blank=True, null=True)
...
The EmailList table looks like this:
class EmailList(models.Model):
name = models.CharField(max_length=50, default='My List')
description = models.TextField(blank=True)
is_active = models.BooleanField(verbose_name="Active")
is_managed_by_user = models.BooleanField(verbose_name="User Managed")
In the app, the user should only see records that is_active=True and is_managed_by_user=True.
In the Admin side, the admin should be able to add a user to any/all of these groups, regardless of the is_active and is_managed_by_user flag.
What happens is that the Admin assigns a user to all of the email list records. Then, the user logs in and can only see a subset of the list (is_active=True and is_managed_by_user=True). This is expected behavior. However, what comes next is not.
The user deselects an email list item and then saves the record. Since M2M_Save first clears all of the m2m records before it calls save() I lose all of the records that the Admin assigned to this user.
How can I keep those? I've tried creating multiple lists and then merging them before the save, I've tried passing the entire list to the template and then hiding the ones where is_managed_by_user=False, and I just can't get anything to work.
What makes this even more tricky for me is that this is all wrapped up in a formset.
How would you go about coding this? What is the right way to do it? Do I filter out the records that the user shouldn't see in my view? If so, how do I merge those missing records before I save any changes that the user makes?
You might want to try setting up a model manager in your models.py to take care of the filtering. You can then call the filter in your views.py like so:
models.py:
class EmailListQuerySet(models.query.QuerySet):
def active(self):
return self.filter(is_active=True)
def managed_by_user(self):
return self.filter(is_managed_by_user=True)
class EmailListManager(models.Manager):
def get_queryset(self):
return EmailListQuerySet(self.model, using=self._db)
def get_active(self):
return self.get_queryset().active()
def get_all(self):
return self.get_queryset().active().managed_by_user()
class EmailList(models.Model):
name = models.CharField(max_length=50, default='My List')
description = models.TextField(blank=True)
is_active = models.BooleanField(verbose_name="Active")
is_managed_by_user = models.BooleanField(verbose_name="User Managed")
objects = EmailListManager()
views.py:
def view(request):
email = EmailList.objects.get_all()
return render(request, 'template.html', {'email': email})
Obviously there is outstanding data incorporated in my example, and you are more than welcome to change the variables/filters according to your needs. However, I hope the above can give you an idea of the possibilities you can try.
In your views you could do email = EmailList.objects.all().is_active().is_managed_by_user(), but the loading time will be longer if you have a lot of objects in your database. The model manager is preferred to save memory. Additionally, it is not reliant on what the user does, so both the admin and user interface have to talk to the model directly (keeping them in sync).
Note: The example above is typed directly into this answer and has not been validated in a text editor. I apologize if there are some syntax or typo errors.